In this tutorial, we will go through the steps to create a Twitter Developer account to access their full API of tweets since March 2006.

Step One:

Make a Twitter account * https://twitter.com/login?lang=en

Step Two:

You must apply for a Twitter dev account. Actual human beings over at Twitter HQ review access requests for developer API keys. Long story short, they need to make sure you’re not a bot.

Acceptance might take hours-days. Check email frequently.

Step Three:

Once you have accessed your development account, you must create a new App on Twitter. Open the dropdown menu next to your profile name, and click on Apps. Then click on the button that says Create an app. The app requires several entries:

It is also important to make sure you Enable Sign in with Twitter.

Once finished, click the Create button.

You have now successfully created your app

Step Four:

Open the drop down menu by your profile name again, but this time, click dev environments. Because we want to access the full archive of Tweets, we will be using the Full Archive/Sandbox environment. Click the button to set up that environment (this should be the one in the middle).

In the pop-up menu, type in a label for the environment (this should be different from the name of your app) and select your app that you created in step 3 before clicking Complete Setup.

You have now successfully set up your environment

Step Five:

Now we need to enable browser authentication. To do this, we will be using the packages rtweet and httpub.

## Install packages if not installed
include <- function(library_name){
  if( !(library_name %in% installed.packages()) )
    install.packages(library_name) 
  library(library_name, character.only=TRUE)
}
include("rtweet")
include("httpuv")

Next, we need to retreive the authentication codes from the application we created in step 3. Go to your Twitter Developer account, navigate to your Apps page, and click on the Details button next to your app. Then, click on the Keys and tokens tab. This shows you your API key and your API secret key. Copy and paste these values into your R script file.

## load rtweet
library(rtweet)

## store api keys (these are fake example values; replace with your own keys)
api_key <- "afYS4vbIlPAj096E60c4W1fiK"
api_secret_key <- "bI91kqnqFoNCrZFbsjAWHD4gJ91LQAhdCJXCj3yscfuULtNkuu"

Step Six:

Now we need to generate access tokens. To do this, we use the same page we got the api keys from. Go down to where it says Access token and access token secret and click Generate. A window will pop up with these keys. Copy these keys into the same R script file, so now we have this…

##(these are fake example values; replace with your own keys)

api_key <- "afYS4vbIlPAj096E60c4W1fiK"
api_secret_key <- "bI91kqnqFoNCrZFbsjAWHD4gJ91LQAhdCJXCj3yscfuULtNkuu"
access_token <- "YS4vbIlPAj096YS4vbIlPAj096"
access_token_secret <- "YS4YS4vbIlPYS4vbIlPAj096Aj096vbIlPAj096"

IMPORTANT: Do not store these keys anywhere in your code that you upload to Github or anywhere else online. Use them to generate your token, and then store them in a separate file that is ignored by the .gitignore file, or save them as environment variables in the ~/.Renviron folder (can be edited using file.edit(“~/.Renviron”))

Step Seven:

Now we need to create a token so that we can access the api. We do this with the create_token() function.

## authenticate via web browser
token <- create_token(
  app = "APPNAME", ## the name of your app
  consumer_key = api_key,
  consumer_secret = api_secret_key),
  access_token = access_token,
  access_secret = access_token_secret
  )

This function automatically saves the token in your environment. You can then use the function get_token() to access your token in future R sessions (on the same machine). Just make sure to view your token to make sure the api keys match.

## check to see if the token is loaded
library(rtweet)
get_token()

Once the token is in your environment, you can start searching for tweets using the rtweet package.

Search Tweets

We can use the search_tweets() to search for tweets that contain a certain keyword. For example…

rt <- search_tweets(
   "#data", n = 18000, include_rts = FALSE
)

In this block of code, “#data” is saying to collect all tweets that contain the word “data”. n represents the number of tweets to return. Twitter only allows 18,000 tweets to be pulled at one time. After that, you must wait 15 minutes before you can receive more. By setting the retryonratelimit attribute to true, this function will automatically wait for the specified time period before receiving more tweets.

rt <- search_tweets(
   "#data", n = 75000, include_rts = FALSE, retryonratelimit = TRUE
)