select_top_terms
allows to select a specified number of top terms
based on miscellaneous properties of the term frequencies. This method is
typically used to select term frequency time series for plotting and
exploratory analysis. See the details of the function arguments for selection
options.
select_top_terms(termFrequencies, topN = 25,
selectBy = "most_frequent", selectTerms = NULL)
Arguments
termFrequencies |
a dataframe of term frequencies as returned by
term_frequencies() |
topN |
the number of returned top terms meeting the selection criteria
in selectBy |
selectBy |
the selection approach which determines the metric by which
term s will be sorted to select the topN terms. Currently, the
following options are supported:
- most_frequent
the default, select terms based on the total
number of occurrences - trending_up
select terms with largest upwards
trend; internally this is measured by the slope of a simple linear
regression fit to a term 's frequency series.
- trending_down
select terms with largest downward trend; internally
this is measured by the slope of a simple linear regression fit to a
term 's frequency series. - trending
select terms with either
largest upward or downward trend; internally this is measured by the
absolute value of the slope of a simple linear regression fit to a
term s frequency series. - most_volatile
select terms with the
largest change throughout the covered time period; internally this is
measured by the residual standard deviation of the linear model fit to a
term 's time frequency series.
|
selectTerms |
a character vector of term patterns, that terms are
matched to for selection. regular expression syntax can be applied,
e.g. if c("^mod", "an", "el$", "^outbreak$") is supplied for
selectTerms , all terms that either start with 'mod'
or contain 'an' or end with 'el' or the
exact term 'outbreak' are matched. The arguments selectBy and
selectTerms can be combined. |
Value
a dataframe specifying trend metrics employed for selecting top
term
s, where:
- term
a unique term
- n_term_total
the total number of a term
's occurrences in the
dataset
- slope
the slope coefficient of a linear model fit to this
term
's time frequency series
- volatility
the residual
standard deviation of a linear model fit to this term
's time
frequency series
- trend
a categorisation of the term frequency
trend