Collecting Tweet Data Via The Twitter Academic Research Product Track • twittrcademic

The twittrcademic package provides a collection of utility functions supporting the retrieval of Tweet data via Twitter v2 API endpoints in the Twitter Academic Research product track.

This package has been set up as a personal library to collect Tweet data for academic research. The package does not provide any functions to analyze the retrieved data.

Background and prerequisites

The API endpoints in the Twitter Academic Research product track offer access to the full Tweet archive. These endpoints rely on the Twitter API v2 with a significantly different Tweet object model compared to the v1.1 API. In addition to structural differences in the JSON responses, the v2 endpoints require that most objects and attributes — in for example a Tweet object — have to be explicitly specified in the API request in order to be included in the response. (By default the v2 search endpoint JSON contains only Tweet ID and text.)

In order to use the functions in this package, API keys specifically for the Academic Research product track are required, standard API access keys will not work.

Installation

Install the development version of twittrcademic from GitHub with:

# install.packages("devtools")
devtools::install_github("sdaume/twittrcademic")

Usage

The package functions can be used to execute a single Tweet search API call against the /2/tweets/search/all endpoint or execute long-running searches for large result sets and store the results in multiple suitably sized batches.

A single search API call

This will return a JSON response of at most 500 Tweets, which could be processed directly with tools like jsonlite. The example below would return the 100 most recent Tweets containing the keyword openscience and posted on the 12. June 2020 or earlier.

library(twittrcademic)

bearer <- oauth_twitter_token(consumerKey = "YOUR_ACADEMIC_PRODUCT_API_KEY",
                              consumerSecret = "YOUR_ACADEMIC_PRODUCT_API_SECRET")

json_response <- search_tweets(queryString = "openscience", 
                               maxResult = 100,  
                               toDate = "2020-06-12",
                               twitterBearerToken = bearer)

Retrieve and store all available results

The following example would collect all Tweets posted in the year 2020 that contain the term ‘planetary boundaries’. This will run until all results are retrieved. The results will be summarised into batches of files that contain approximately 20000 Tweets; files are stored in the working directory and all start with ‘query_label’ (for example query_label_20200101_20201231_1_20453.json); in addition to the base label the file name indicates the date range (implicit or explicit) of the query, a numeric index for the batch and the number of Tweet results in the given batch.

library(twittrcademic)

bearer <- oauth_twitter_token(consumerKey = "YOUR_ACADEMIC_PRODUCT_API_KEY",
                              consumerSecret = "YOUR_ACADEMIC_PRODUCT_API_SECRET")

search_and_store_tweets(queryString = "planetary boundaries", 
                        fromDate = "2020-01-01", 
                        toDate = "2020-12-31",
                        maxBatchSize = 20000,
                        batchBaseLabel = "query_label",
                        twitterBearerToken = bearer)

License, credits and acknowledgements

The package is shared under an MIT License.

This package has been developed to support research at the Stockholm Resilience Centre; this research has benefited from funding by the Swedish Research Council for Sustainable Development (Formas).

Disclaimer

This package has been developed as a reusable tool for the author(s) own research and comes with no guarantee for the correctness or completeness of the retrieved data.

twittrcademic

Background and prerequisites

Installation

Usage

A single search API call

Retrieve and store all available results

License, credits and acknowledgements

Disclaimer

Links

License

Developers

Dev status