plot_term_frequencies
plots time series of topic shares for a
selection of topics in a faceted plot. Each topic is displayed in a single
subplot and each time series is overlayed with a linear trendline.
plot_topic_frequencies(topicsByDocDate, topicLabels = NULL,
timeBinUnit = "week", topN = 25, minTopicTimeBins = 0.5,
minGamma = 0.01, selectTopicsBy = "most_frequent",
selectTopics = NULL, verboseLabels = FALSE, nCols = 5)
Arguments
topicsByDocDate |
a dataframe as returned by
topics_by_doc_date |
topicLabels |
a dataframe as returned by topics_terms_map ,
associating a topic_id with a suitable topic_label ; if
NULL (the default), suitable default labels will be generated. |
timeBinUnit |
a character sequence specifying the time period that
should be used as a bin unit when computing topic share frequencies. Valid
values are "day", "week", "month", "quarter", "year" , "week"
is the default. NOTE, for the assignment of week s Monday is
considered as the first day of the week. |
topN |
the number of top topics (according to the selection criteria in
selectTopicsBy ) that should be displayed |
minTopicTimeBins |
a double in the range [0,1] specifying the
minimum share of all unique timebins in which an occurrence of a topic
share of at least minGamma must have been recorded, i.e. a value of
0.5 (the default) requires that an occurrence of a topic must have
been recorded in at least 50% of all unique timebins covered by the
dataset; topics that do not meet this threshold will not be included in the
returned results. |
minGamma |
the minimum share of a topic per document to be considered
when summarizing topic frequencies; topics with smaller shares per
individual document will be ignored when computing topic frequencies. The
default is 0.01 , but should be adjusted with view of the number of
topics and the average length of a document. (In an
stm topic model the likelihood that a topic is
generated from a topic is expressed by the value gamma.) |
selectTopicsBy |
the selection approach which determines the metric by
which topic_id s will be sorted to select the topN topics.
Currently, the following options are supported:
- most_frequent
the default, select topics based on the total
number of documents in which the topic occurs (NOTE, that the
document count depends on the minimum topic likelihood minGamma that
was specified when obtaining the topic frequencies.)
- trending_up
select topics with largest upwards trend; internally
this is measured by the slope of a simple linear regression fit to a
topic_id 's frequency series. - trending_down
select topics
with largest downward trend; internally this is measured by the slope of a
simple linear regression fit to a topic_id 's frequency series.
- trending
select topics with either largest upward or downward trend;
internally this is measured by the absolute value of the slope of a simple
linear regression fit to a topic_id s frequency series.
- most_volatile
select topics with the largest change throughout the
covered time period; internally this is measured by the residual standard
deviation of the linear model fit to a topic_id 's time frequency
series. - topic_id
select topics specified by topic_id in the
function argument selectTopics .
|
selectTopics |
a vector of topic IDs which should be plotted; this
option is only considered when the option "topic_id" is chosen for
selectBy . |
verboseLabels |
a Boolean indicating if additional topic information
should be used to labels subplots. The default is FALSE . |
nCols |
the number of columns along which topic subplots should be layed
out. |
Details
This function merges the computation of topic frequencies
(topic_frequencies
), creation of suitable labels
(topics_terms_map
) and the selection of topics by miscellaneous
criteria (select_top_topics
) into one step.
See also