The twittrcademic
package provides a collection of utility functions supporting the retrieval of Tweet data via Twitter v2 API endpoints in the Twitter Academic Research product track.
This package has been set up as a personal library to collect Tweet data for academic research. The package does not provide any functions to analyze the retrieved data.
The API endpoints in the Twitter Academic Research product track offer access to the full Tweet archive. These endpoints rely on the Twitter API v2 with a significantly different Tweet object model compared to the v1.1 API. In addition to structural differences in the JSON responses, the v2 endpoints require that most objects and attributes — in for example a Tweet object — have to be explicitly specified in the API request in order to be included in the response. (By default the v2 search endpoint JSON contains only Tweet ID and text.)
In order to use the functions in this package, API keys specifically for the Academic Research product track are required, standard API access keys will not work.
Install the development version of twittrcademic
from GitHub with:
# install.packages("devtools")
devtools::install_github("sdaume/twittrcademic")
The package functions can be used to execute a single Tweet search API call against the /2/tweets/search/all endpoint or execute long-running searches for large result sets and store the results in multiple suitably sized batches.
This will return a JSON response of at most 500 Tweets, which could be processed directly with tools like jsonlite
. The example below would return the 100 most recent Tweets containing the keyword openscience and posted on the 12. June 2020 or earlier.
library(twittrcademic)
bearer <- oauth_twitter_token(consumerKey = "YOUR_ACADEMIC_PRODUCT_API_KEY",
consumerSecret = "YOUR_ACADEMIC_PRODUCT_API_SECRET")
json_response <- search_tweets(queryString = "openscience",
maxResult = 100,
toDate = "2020-06-12",
twitterBearerToken = bearer)
The following example would collect all Tweets posted in the year 2020 that contain the term ‘planetary boundaries’. This will run until all results are retrieved. The results will be summarised into batches of files that contain approximately 20000 Tweets; files are stored in the working directory and all start with ‘query_label’ (for example query_label_20200101_20201231_1_20453.json
); in addition to the base label the file name indicates the date range (implicit or explicit) of the query, a numeric index for the batch and the number of Tweet results in the given batch.
library(twittrcademic)
bearer <- oauth_twitter_token(consumerKey = "YOUR_ACADEMIC_PRODUCT_API_KEY",
consumerSecret = "YOUR_ACADEMIC_PRODUCT_API_SECRET")
search_and_store_tweets(queryString = "planetary boundaries",
fromDate = "2020-01-01",
toDate = "2020-12-31",
maxBatchSize = 20000,
batchBaseLabel = "query_label",
twitterBearerToken = bearer)
The package is shared under an MIT License.
This package has been developed to support research at the Stockholm Resilience Centre; this research has benefited from funding by the Swedish Research Council for Sustainable Development (Formas).