Plot individual topic frequencies by date in a faceted plot

plot_term_frequencies plots time series of topic shares for a selection of topics in a faceted plot. Each topic is displayed in a single subplot and each time series is overlayed with a linear trendline.

plot_topic_frequencies(topicsByDocDate, topicLabels = NULL,
  timeBinUnit = "week", topN = 25, minTopicTimeBins = 0.5,
  minGamma = 0.01, selectTopicsBy = "most_frequent",
  selectTopics = NULL, verboseLabels = FALSE, nCols = 5)

Arguments

topicsByDocDate	a dataframe as returned by `topics_by_doc_date`
topicLabels	a dataframe as returned by `topics_terms_map`, associating a `topic_id` with a suitable `topic_label`; if `NULL` (the default), suitable default labels will be generated.
timeBinUnit	a character sequence specifying the time period that should be used as a bin unit when computing topic share frequencies. Valid values are `"day", "week", "month", "quarter", "year"`, `"week"` is the default. NOTE, for the assignment of `week`s Monday is considered as the first day of the week.
topN	the number of top topics (according to the selection criteria in `selectTopicsBy`) that should be displayed
minTopicTimeBins	a double in the range `[0,1]` specifying the minimum share of all unique timebins in which an occurrence of a topic share of at least `minGamma` must have been recorded, i.e. a value of `0.5` (the default) requires that an occurrence of a topic must have been recorded in at least 50% of all unique timebins covered by the dataset; topics that do not meet this threshold will not be included in the returned results.
minGamma	the minimum share of a topic per document to be considered when summarizing topic frequencies; topics with smaller shares per individual document will be ignored when computing topic frequencies. The default is `0.01`, but should be adjusted with view of the number of topics and the average length of a document. (In an `stm topic model` the likelihood that a topic is generated from a topic is expressed by the value gamma.)
selectTopicsBy	the selection approach which determines the metric by which `topic_id`s will be sorted to select the `topN` topics. Currently, the following options are supported: most_frequent the default, select topics based on the total number of documents in which the topic occurs (NOTE, that the document count depends on the minimum topic likelihood `minGamma` that was specified when obtaining the topic frequencies.) trending_up select topics with largest upwards trend; internally this is measured by the slope of a simple linear regression fit to a `topic_id`'s frequency series. trending_down select topics with largest downward trend; internally this is measured by the slope of a simple linear regression fit to a `topic_id`'s frequency series. trending select topics with either largest upward or downward trend; internally this is measured by the absolute value of the slope of a simple linear regression fit to a `topic_id`s frequency series. most_volatile select topics with the largest change throughout the covered time period; internally this is measured by the residual standard deviation of the linear model fit to a `topic_id`'s time frequency series. topic_id select topics specified by `topic_id` in the function argument `selectTopics`.
selectTopics	a vector of topic IDs which should be plotted; this option is only considered when the option "topic_id" is chosen for `selectBy`.
verboseLabels	a Boolean indicating if additional topic information should be used to labels subplots. The default is `FALSE`.
nCols	the number of columns along which topic subplots should be layed out.

Details

This function merges the computation of topic frequencies (topic_frequencies), creation of suitable labels (topics_terms_map) and the selection of topics by miscellaneous criteria (select_top_topics) into one step.

Arguments

Details

See also