R/term-analysis.R
term_frequencies.Rdterm_frequencies summarizes the counts and relative frequency of terms
in a chosen timebin for a given collection of terms with date of occurrence.
term_frequencies(termsByDate, timeBinUnit = "week", minTermTimeBins = 0.5, minTermOccurences = 10)
| termsByDate | a dataframe as returned by |
|---|---|
| timeBinUnit | a character sequence specifying the time period that
should be used as a bin unit when computing term frequencies. Valid values
are |
| minTermTimeBins | a double in the range |
| minTermOccurences | an integer specifying the minimum of total occurrences of a term to be included in the results; terms that do not meet this threshold will not be included in the returned results. |
a dataframe with term frequencies by chosen timebin, where:
a term as provided as an input in
termsByDate
the first day of the a timebin; if
timeBinUnit was set to week, this date will always be a
Monday
the number of occurrences of term
in timebin
the exact date of the first
occurrence of term across the whole time range covered by
timebins
the exact date of the latest
occurrence of term across the whole time range covered by
timebins; note that this date can be larger than the maximum
timebin, as timebin specifies the floor date of a time unit
the share of term in a given
timebin with respect to all other term occurrences in a
timebin; NOTE that this is computed in consideration of all
terms, including those that may be filtered out of the results
the total number of occurrences in the dataset, i.e.
across all timebins
the number of unique
timebins in which an occurrence of term was recorded
Timebins for which no occurrence of a given term is recorded are added with an explicit value of zero, excluding however such empty timebins before the first occurrence of a term and after the last.