Twitter Scraping and API

There's a few of us who would like collect a list of retweeters and the urls to their pages, for the purpose of sampling the population for shillbots.

I've had limited success working through a browser and just scrolling down but that's replies not retweets and it's a very limited number, just over 200 results.

Is there anyone here who can help get what we need? I don't know what an api is nevermind how to use it.

stackoverflow.com/questions/6316899/how-to-get-a-list-of-all-retweeters-in-twitter

Other urls found in this thread:

dev.twitter.com/rest/reference/get/statuses/retweets/:id
inventwithpython.com/hacking/
truthy.indiana.edu/botornot/
github.com/truthy/botornot-python
truthy.indiana.edu/botornot/http-api.html
twitter.com/NSFWRedditVideo

Install Gentoo

That's not funny, that's how my wife's son died.

go back to /g/

INSTALL GENTOO MOTHERFUCKER

It's looking likely that there's no conveniently accessible list of all retweets. If you want the data over time then you have to collect it yourself. With a "streaming api"?


I think the best I can do is use the search and the phrase then scroll the "Live" tab as far as it will go

~650 results doing it like this

I've used twitter's api with tweepy and it looks like you could probably do it with this:

dev.twitter.com/rest/reference/get/statuses/retweets/:id

And I know that says its limited to 100 but I've been able to get 1000s of results from other api calls that say they're limited to n results. There is a way around it I think by sending some parameter.

Thanks, I know nothing about api though. How do you even run that code?

I'm surprised there's an open API. Twitter is known for having one in the start, then killing it and restricting access to the most successful apps.


Define precisely how you would determine someone to be a shillbot. Percentage of retweets? Text analysis? Some crazy AI shit?

import osimport tweepyimport jsonimport timeimport requestsimport reimport sysimport urllibg_count = 0consumer_key = 'yourconsumerkey'consumer_secret = 'yourconsumersecret'access_token = 'youraccesstoken'access_token_secret = 'youraccesstokensecret'auth = tweepy.OAuthHandler(consumer_key, consumer_secret)auth.set_access_token(access_token, access_token_secret)api = tweepy.API(auth)def get_users_friends(): user = api.get_user('twitter') print(user.screen_name) print(user.followers_count) for friend in user.friends(): print(friend.screen_name)def print_public_tweets(): public_tweets = api.home_timeline() for tweet in public_tweets: print(tweet)def search(): result = api.search(input('Query: '), count=2500) for status in result: out.write(str(status) + '\n\n') for status in result: print('Text: ' + str(status.text) + '\n')def get_rate_limit_status(): print(api.rate_limit_status())# Streaming API Listenerstream_listener = tweepy.streaming.StreamListenerclass StdOutListener(stream_listener): """ A listener handles tweets that are received from the stream. This is a basic listener that encodes inbound tweets(strings) as json. """ def on_data(self, data): global g_count while g_count < 1000: # tweet batch size, adjust as necessary parse(data) return True self.clear_tweet_buffer() return True def on_error(self, status): print(status) def clear_tweet_buffer(self): global g_count g_count = 0def parse(data): global g_count g_count+= 1 parsed_json = json.loads(data) if parsed_json.get('text'): # do parsing work pass def streamloop(): # Streaming api example global query, listen, stream listen = StdOutListener() stream = tweepy.Stream(auth, listen, gzip=True) stream.filter(track=['hillary'], async=True)def main(): # Get users friends get_users_friends() # Search search() # Get Rate Limit Information get_rate_limit_status() # Streaming API example streamloop()if __name__ == '__main__': main()

Right now it's purely subjective and reliant on humans to place votes. The goal is to compare results against TwitterAudit who are placing Hillary's followers at 90% human. They're automated over a much larger sample but I'm not sure if their algorithms will be as good as human judgement.

A smaller sample with better accuracy of judgement should be easily as good as something like TwitterAudit

But how do you do anything with that?

Print it out and post it to Twitter HQ?

Do whatever you want with it. Filter the stream for keywords then collect tweet data. Or implement other API functions. Literally anything. This is just an example of how to get started with the API.

I briefly tried installing Python 2.7.11 yesterday for something else and wasn't able to do anything useful so I moved on to something else.

I can copy that into notepad and change the variables around but I don't know what to save it as or how to run it.

Well then start with some python books. Don't give up so easily, there are tons of resources out there.

This isn't related to twitter at all but is one of my favorites: inventwithpython.com/hacking/

Thanks. Looks good.

Twitter wants an account with a mobile number. This could take days before I get a code.

In the mean time I can use python to chop up a url.

Can Python be used for anything in Excel alongside some VBA?

Yeah, forgot to mention that. I got mine a few years ago and just use it to mess around with their API.

Nice start. Also look into regular expressions (don't get too caught up in them but learn what they are and how they're used to search text)

Probably, but other databases are better. Sqlite, postgres, etc.

>>>Holla Forums

Why waste time on a controlled botnet for advertisting and brainwashing

There is no hope in this toy

Finally got a code to access the api so I've started looking at it again.

Progress so far; "Fuck you Python you frustrating turd"

I've been looking at this thing "BotOrNot" first, will try soon

I've finally got python and pip installed. I think I've got botornot installed, in doing so it installed tweepy and something else. So far everything seems untuitive and readmes/instructions tend to expect you to know a lot of random things.

Wew. I think being a software developer would make me want to jump in front of a train.

truthy.indiana.edu/botornot/
github.com/truthy/botornot-python
truthy.indiana.edu/botornot/http-api.html

FINALLY

Now I just have to find out how to run it as a standalone script and output results to a nice excel file, looping through a list of names.

The problem/solution was

Tried running your script but got a bunch of errors.

Is Python 3 correct?

I doubt it, python 2 is still pervasive. Also, you will need to install the libraries that are being used:

import os
import tweepy
import json
import time
import requests
import re
import sys
import urllib