Twitter Scraping and API

Question

Twitter Scraping and API

Michael Flores

There's a few of us who would like collect a list of retweeters and the urls to their pages, for the purpose of sampling the population for shillbots.

I've had limited success working through a browser and just scrolling down but that's replies not retweets and it's a very limited number, just over 200 results.

Is there anyone here who can help get what we need? I don't know what an api is nevermind how to use it.

stackoverflow.com/questions/6316899/how-to-get-a-list-of-all-retweeters-in-twitter

June 10, 2016 - 11:58

Other urls found in this thread:

dev.twitter.com/rest/reference/get/statuses/retweets/:id
inventwithpython.com/hacking/
truthy.indiana.edu/botornot/
github.com/truthy/botornot-python
truthy.indiana.edu/botornot/http-api.html
twitter.com/NSFWRedditVideo

Aiden Brooks

Install Gentoo

June 10, 2016 - 12:01

Owen Miller

That's not funny, that's how my wife's son died.

June 10, 2016 - 12:09

Mason Wright

go back to /g/

June 10, 2016 - 12:13

Oliver Reed

INSTALL GENTOO MOTHERFUCKER

June 10, 2016 - 12:37

Alexander Hughes

It's looking likely that there's no conveniently accessible list of all retweets. If you want the data over time then you have to collect it yourself. With a "streaming api"?

I think the best I can do is use the search and the phrase then scroll the "Live" tab as far as it will go

~650 results doing it like this

June 10, 2016 - 13:17

Jacob Brooks

I've used twitter's api with tweepy and it looks like you could probably do it with this:

dev.twitter.com/rest/reference/get/statuses/retweets/:id

June 10, 2016 - 13:25

Sebastian Edwards

And I know that says its limited to 100 but I've been able to get 1000s of results from other api calls that say they're limited to n results. There is a way around it I think by sending some parameter.

June 10, 2016 - 13:28

Luis Carter

Thanks, I know nothing about api though. How do you even run that code?

June 10, 2016 - 13:50

Jaxon Collins

I'm surprised there's an open API. Twitter is known for having one in the start, then killing it and restricting access to the most successful apps.

Define precisely how you would determine someone to be a shillbot. Percentage of retweets? Text analysis? Some crazy AI shit?

June 10, 2016 - 14:24

Angel Nguyen

import osimport tweepyimport jsonimport timeimport requestsimport reimport sysimport urllibg_count = 0consumer_key = 'yourconsumerkey'consumer_secret = 'yourconsumersecret'access_token = 'youraccesstoken'access_token_secret = 'youraccesstokensecret'auth = tweepy.OAuthHandler(consumer_key, consumer_secret)auth.set_access_token(access_token, access_token_secret)api = tweepy.API(auth)def get_users_friends(): user = api.get_user('twitter') print(user.screen_name) print(user.followers_count) for friend in user.friends(): print(friend.screen_name)def print_public_tweets(): public_tweets = api.home_timeline() for tweet in public_tweets: print(tweet)def search(): result = api.search(input('Query: '), count=2500) for status in result: out.write(str(status) + '\n\n') for status in result: print('Text: ' + str(status.text) + '\n')def get_rate_limit_status(): print(api.rate_limit_status())# Streaming API Listenerstream_listener = tweepy.streaming.StreamListenerclass StdOutListener(stream_listener): """ A listener handles tweets that are received from the stream. This is a basic listener that encodes inbound tweets(strings) as json. """ def on_data(self, data): global g_count while g_count < 1000: # tweet batch size, adjust as necessary parse(data) return True self.clear_tweet_buffer() return True def on_error(self, status): print(status) def clear_tweet_buffer(self): global g_count g_count = 0def parse(data): global g_count g_count+= 1 parsed_json = json.loads(data) if parsed_json.get('text'): # do parsing work pass def streamloop(): # Streaming api example global query, listen, stream listen = StdOutListener() stream = tweepy.Stream(auth, listen, gzip=True) stream.filter(track=['hillary'], async=True)def main(): # Get users friends get_users_friends() # Search search() # Get Rate Limit Information get_rate_limit_status() # Streaming API example streamloop()if __name__ == '__main__': main()

June 10, 2016 - 14:35

Alexander Lee

Right now it's purely subjective and reliant on humans to place votes. The goal is to compare results against TwitterAudit who are placing Hillary's followers at 90% human. They're automated over a much larger sample but I'm not sure if their algorithms will be as good as human judgement.

A smaller sample with better accuracy of judgement should be easily as good as something like TwitterAudit

June 10, 2016 - 14:44

Henry Young

But how do you do anything with that?

Print it out and post it to Twitter HQ?

June 10, 2016 - 14:49

Christopher Young

Do whatever you want with it. Filter the stream for keywords then collect tweet data. Or implement other API functions. Literally anything. This is just an example of how to get started with the API.

June 10, 2016 - 14:52

Leo Murphy

I briefly tried installing Python 2.7.11 yesterday for something else and wasn't able to do anything useful so I moved on to something else.

I can copy that into notepad and change the variables around but I don't know what to save it as or how to run it.

June 10, 2016 - 14:57

Caleb Sanchez

Well then start with some python books. Don't give up so easily, there are tons of resources out there.

This isn't related to twitter at all but is one of my favorites: inventwithpython.com/hacking/

June 10, 2016 - 15:01

Charles Butler

Thanks. Looks good.

June 10, 2016 - 15:29

Xavier Rodriguez

Twitter wants an account with a mobile number. This could take days before I get a code.

In the mean time I can use python to chop up a url.

Can Python be used for anything in Excel alongside some VBA?

June 10, 2016 - 16:58

Aaron Wilson

Yeah, forgot to mention that. I got mine a few years ago and just use it to mess around with their API.

Nice start. Also look into regular expressions (don't get too caught up in them but learn what they are and how they're used to search text)

Probably, but other databases are better. Sqlite, postgres, etc.

June 10, 2016 - 17:31

Camden Cook

>>>Holla Forums

June 11, 2016 - 11:21

Gabriel Evans

Why waste time on a controlled botnet for advertisting and brainwashing

There is no hope in this toy

June 12, 2016 - 00:09

Henry Reed

Finally got a code to access the api so I've started looking at it again.

Progress so far; "Fuck you Python you frustrating turd"

I've been looking at this thing "BotOrNot" first, will try soon

I've finally got python and pip installed. I think I've got botornot installed, in doing so it installed tweepy and something else. So far everything seems untuitive and readmes/instructions tend to expect you to know a lot of random things.

Wew. I think being a software developer would make me want to jump in front of a train.

truthy.indiana.edu/botornot/
github.com/truthy/botornot-python
truthy.indiana.edu/botornot/http-api.html

June 16, 2016 - 09:06

Nicholas Morales

FINALLY

Now I just have to find out how to run it as a standalone script and output results to a nice excel file, looping through a list of names.

June 16, 2016 - 09:11

Jason Diaz

The problem/solution was

June 16, 2016 - 09:12

Juan Edwards

Tried running your script but got a bunch of errors.

Is Python 3 correct?

June 16, 2016 - 09:45

Tyler Watson

I doubt it, python 2 is still pervasive. Also, you will need to install the libraries that are being used:

import os
import tweepy
import json
import time
import requests
import re
import sys
import urllib

June 18, 2016 - 09:48

1 2 3 Next

Twitter Scraping and API

Last threads