This is a dataset of tweets from various active scientists and personalities ranging from Donald Trump and Hillary Clinton to Neil deGrasse Tyson. More are forthcoming.
They were obtained through javascript scraping of the browser twitter timeline rather than a Tweepy python API or the twitter timeline API.
The inspiration for this twitter dataset is comparing tweets in my own twitter analysis to find who tweets like whom, e.g. does Trump or Hillary tweet more like Kim Kardashian than one another?
Thus, this goes further back in time than anything directly available from Twitter.
The data is in JSON format rather than CSV, which will be forthcoming as well.
Kim Kardashian, Adam Savage, BillNye, Neil deGrasse Tyson, Donald Trump, and Hillary Clinton have been collected up to 2016-10-14
Richard Dawkins, Commander Scott Kelly, Barack Obama, NASA, and The Onion, tweets up to 2016-10-15.
For your own pleasure, with special thanks to the Trump Twitter Archive for providing some of the code, here is the JavaScript used to scrape tweets off of a timeline and output the results to the clipboard in JSON format:
-
Construct the query with from:TWITTERHANDLE since:DATE until:DATE
-
In the browser console set up automatic scrolling with:
setInterval(function(){ scrollTo(0, document.body.scrollHeight) }, 2500)
-
Scrape the resulting timeline with:
var allTweets = []; var tweetElements = document.querySelectorAll('li.stream-item');
for (var i = 0; i < tweetElements.length; i++) { try {
var el = tweetElements[i]; var text = el.querySelector('.tweet-text').textContent; allTweets.push({ id: el.getAttribute('data-item-id'), date: el.querySelector('.time a').textContent, text: text, link: el.querySelector('div.tweet').getAttribute('data-permalink-path'), retweet: text.indexOf('"@') == 0 && text.includes(':') ? true : false }); } catch(err) {}
};
copy(allTweets);
Have fun!