Twint - A Twitter Scaping Tool

Recently I’ve been working on a python project about a twitter sentiment trading strategy(which will be introduced in my following posts), which requires a large load of tweets. While collecting data on Twitter, I found out that usual data crawling doesn’t work here since there’s limitation of Twitter’s API which only allows us to scrap at most last 3200 Tweets. This post introduces a really interesting and useful python package I found out which could be used to collect data without authentication, API and limitations. This package is called twint.

Twint is the shortcut for “Twitter Intelligence Tool”. Almost everything we need on Twitter could be collected by this powerful tool. It does take time when you wanna collect a load of data, however, it still could give us what we need. I’ll briefly explain the process of how I downloaded this package and used it to collect the data I need. Detailed information and instructions about this package could be found in the reference of this post.

Installing

I used pip3 to install the package:

1
pip3 install twint

Collecting Tweets

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def get_tweets(start_date, end_date, company_name, ticker):
    c = twint.Config()
    c.Search = company_name, ticker
    c.Since = start_date
    c.Until = end_date
    c.Store_csv = True
    c.Lang = 'en'
    c.Count = True
    c.Hide_output = True
    c.Format = 'Tweet id: {id} | Date: {date} | Time: {time} | Tweet: {tweet}'
    c.Custom['tweet'] = ['id', 'date', 'time', 'tweet']
    c.Output = f'{company_name}_tweets_202001.csv'
    twint.run.Search(c)

In the package, the author’s already written the function for us to search by key words, dates, user names etc.. It also allows us to output the data as different data types such as csv and txt. Here, I wrote a function to use twint to collect tweets that I need, output and store the data in a csv file for future computation. The arguments of this function include the time range of the tweet that I wanna collect and also the key word I want to search.

Since my strategy is about twitter sentiment in technology sector, I chose ten stocks and used twint to collect tweets about them from January 2020 to May 2020.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
start_date = '2020-01-01'
end_date = '2020-05-31'
stock_pool = {'tesla': 'tsla', 'netflix': 'nflx', 'microsoft': 'msft', 'zoom': 'zm', 'apple': 'aapl',
              'amazon': 'amzn', 'twitter': 'twtr', 'google': 'googl', 'sony': 'sne', 'nvidia': 'nvda'}

for k, v in tqdm(stock_pool.items()):
    if os.path.isfile(f'data/tweets_{k}_202001.csv'):
        print(f'{k} finished')
    else:  
        print(f'downloading {k}')
        get_tweets(start_date, end_date, k, v)

Although I didn’t use them, there’re many useful functions of this package including getting certain users’ list of followers, follows and favorites. From the documentation of the package, I learned that the author also wrote a twint graph visulizers to help users graph the social network of users. These are all interesting learning and practice materials.

Reference

https://github.com/twintproject/twint