Potential antifa detector

Question

Potential antifa detector

Oliver Walker

Well this is just GREAT.
There is Twitter "Russia"/"Alt-right" detector using Neural Networks on the loose…
But we can take the source code, dump our own "Good/Bad" dataset,
host the API in a server for people to use (add a password for extra security),
and we can completely turn things around.

(((tester))) site: makeadverbsgreatagain.us/mlp/
Article reference: medium.com/@conspirator0/identifying-political-bot-troll-social-media-activity-using-machine-learning-20dcd56e961a
Source code: makeadverbsgreatagain.us/twitter-ml-2017-04-29.zip
The (((dev))): twitter.com/conspirator0

January 4, 2018 - 06:54

Other urls found in this thread:

github.com/HectorAnadon/Face-expression-and-ethnic-recognition
github.com/NadineAB/GenderClassifcationOfHumanFaces
github.com/jaronson/py-racedetect
my.mixtape.moe/dfbper.zip
archive.fo/kHsCK
rationalwiki.org/wiki/Eric_S._Raymond
github.com/armbues/deep_cyber
github.com/danielegrattarola/twitter-sentiment-cnn
github.com/xiaohan2012/twitter-sent-dnn
makeadverbsgreatagain.us/twitter-ml-2017-04-29.zip
scikit-learn.org/stable/modules/naive_bayes.html
scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
spinningbytes.com/resources/embeddings/
github.com/pinkeshbadjatiya/twitter-hatespeech
github.com/t-davidson/hate-speech-and-offensive-language
archive.fo/TcTwY
github.com/perkinslr/antispam
github.com/haccer/tweep
my.mixtape.moe/dueokr.py
github.com/bluquar/reddit_scraper
pastebin.com/XqDtyQUD
8ch.net/polmeta/res/28806.html
archive.is/cdyCI
archive.is/H6Q6e
archive.is/SeJ8O
archive.is/GTWOO
archive.is/FfPFp
archive.is/rPIj1
archive.is/LEhxc
archive.is/UuJHI
archive.is/A7nmw
archive.is/5X0Pl
archive.is/XJMCb
justpaste.it/1epdt_
blocktogether.org/show-blocks/V3Zsg8lpBfUFrSLZjjI9o9ypW9SYb_pd4enH_GxC
8ch.net/pol/res/11158236.html#q11158562
blocktogether.org/show-blocks/${id}
en.wikipedia.org/wiki/Stylometry
github.com/psal/anonymouth
aicbt.com/authorship-attribution/
dariah-de.github.io/DARIAH-DKPro-Wrapper/tutorial.html
github.com/jpotts18/stylometry
github.com/worldwise001/stylometry
github.com/pagea/unstyle
github.com/usc-isi-i2/dig-stylometry
github.com/paper82/pylometry
github.com/mikekestemont/pystyl
kaggle.com/christopher22/stylometry-identify-authors-by-sentence-structure
twitter.com/AnonBabble

Josiah Richardson

is that the fucking AI that got me banned in early '17?

January 4, 2018 - 07:07

Dylan Jones

Always a good idea

January 4, 2018 - 07:21

Thomas Powell

bump. would be really useful if /baph/ ran with this idea.

January 4, 2018 - 07:26

Christopher Wright

Antifa are easier to identify than "russia"/"altright" because they're very vocal about their membership. That being said you could probably use well known antifa to create the database. Could be an interesting idea tbh

January 4, 2018 - 07:26

Lucas Gonzalez

Look for anyways to alter the AI. Trick it
anyways, how are we going to reverse the censorship or stop the excess censorship?

January 4, 2018 - 07:44

Ryder Sullivan

Use AI to subvert. You can train these same networks to shitpost.

January 4, 2018 - 07:48

Jackson Evans

Extend that to finding liberals in 15 seconds, and you will be amazed how much more you can do with this

This categories things, it's not a chat bot.

January 4, 2018 - 08:11

Jackson Campbell

some stuff I found a while back

January 4, 2018 - 08:21

Xavier Martinez

So you found they guy? Thats good

Anyway, we need to make this NN by gathering:
1. List of 3K an-ti-fa/Democrat/liberal/progressive Twitter Usernames
2. List of 7~10K Twitter Username that doesn't fit 1's criteria
3. A large computer to run the source code

January 4, 2018 - 09:48

Eli Sanders

>>>/out/

January 4, 2018 - 10:02

Aaron Edwards

Maybe we can get Holla Forums to make the software faster and more efficient, since it is originally written in Java

January 4, 2018 - 10:18

Adam Bell

For making a simple jewdar
github.com/HectorAnadon/Face-expression-and-ethnic-recognition
For making a tranny spotter (or ultra jewdar)
github.com/NadineAB/GenderClassifcationOfHumanFaces

January 4, 2018 - 10:34

Grayson Flores

Forgot about github.com/jaronson/py-racedetect

January 4, 2018 - 10:41

John Williams

If there's a logo that needs to be made Holla Forums will surely but helpful.

January 4, 2018 - 11:14

Andrew Wilson

be*

January 4, 2018 - 11:15

Oliver Morris

This is what I don't understand about these people. Why tell everyone about this? Why give out your code? Why not just keep this secret?
Do they need their ego stroked this badly? I guess this is all centered around Twitter.

January 4, 2018 - 11:24

Hunter Sullivan

Holla Forums tradition of open source GPL software.
Read the license. "copyright" trashed.

January 4, 2018 - 11:35

Eli Brown

It's our culture, read The Cathedral and the Bazaar by Eric S Raymond
'our' meaning Holla Forums and beyond

January 4, 2018 - 11:49

Andrew Morris

Somehow funny, today i thought about some nice phrases that match your book without having ever heard of it.

Do you also like poems about mountains?

January 4, 2018 - 12:21

Luke Cooper

Please explain to

TLDR, even if it is ESR.

Essentially, unless you are (((Intel, Nvidia, Google, Facebook, Microsoft, apple))), you are expected to show proof of work i.e. source code.
Without source code, it is harder to find malware, and easier to check if they are (((messing))) with you, see: NSA.
FSF, EFF and other liberal infestation are by policy required to show their source code, so we can fuck them over that way.
If they break their own license we can by law sue the shit out of them, and no (((Code of Conduct))) can stop that. It never works.

January 4, 2018 - 12:41

Isaiah Lopez

my.mixtape.moe/dfbper.zip

January 4, 2018 - 12:43

Lincoln Hughes

I dont think fef, eff and all the greasy fat nerds are really a problem, i think their very own problem is that their psychological makeup is easy to exploit for everyone that has an "do gooder" agenda outside of their fat nerd thing which means antifa, feminists and diehard ideological clowns etc. i guess some agencies could easily exploit that.
Something tells me that the former you mentioned designe those that to go to the latter in some weird, mystical way.

January 4, 2018 - 12:55

Jose Green

(((NSA))) vs (((FSF/EFF)))
When nothing happens, the former surveils and build backdoors while the latter "promote net neutrality" and "protect internet privacy"
When they have a common enemy they fight together to crush the right in every single way
But if we exploit the latter group we can definitely get an advantage over the former.

January 4, 2018 - 13:07

Jonathan Lewis

I dont think you really know what the NSA does or you are baiting me.

If the NSA is smart, which i assume then they keep things in a sort of equilibrium, therefore the somehow similar fat greesy nerds (and overbreed dogs) push whatever side they want with the right information (cosupplied by a different breed of naive nerds called private sector omegas) to control the goyim. But do you know what is also a logical consequence of the open source goyim, they come from the center, and once the middle class gets reduced, your "grass routes" shit is fucked thanks to feminism and other crap, i guess the do gooders are zombiefyed in a way, weil ihr einfach schwuchteln seid und schon immer wart.

January 4, 2018 - 13:22

Joseph Foster

archive.fo/kHsCK
I also find it very interesting that some presumably fbi or lawyer cunt advised the frogs not to be in image board threads when betting on trump etc. when article related happened.
I wonder how much of a disgustingly brittle dick you have to have to be such a piece of shit.

January 4, 2018 - 13:44

Luis Morgan

That is exact thing that I worry about, of how NSA pulls the strings of the SJWs.
But as long as we make SJWs eat their words regarding who can use the software,
And as long as we create a "third position" in the tech world, we will not be destroyed.
Eric S Raymond himself is anti-SJW, hates commies, fags and Muslims… Just looks at this
rationalwiki.org/wiki/Eric_S._Raymond
We need more of THIS

January 4, 2018 - 18:18

Wyatt Hernandez

Well you want this?
github.com/armbues/deep_cyber

January 5, 2018 - 00:01

Luke Murphy

...

January 5, 2018 - 00:09

Eli Bell

For those who want to check sentiment (positivey/negative reaction):
github.com/danielegrattarola/twitter-sentiment-cnn
github.com/xiaohan2012/twitter-sent-dnn

How we can upgrade the Neural Networks:
1. Find keywords in the training data (see OP)
2. From the training data, distinguish positive keywords ("I love being a cuck") vs negative keywords ("Bash the fash") (see links)
3. Reverse everything, and find out potential supporters with anticom/right
4. Create a keyword cluster from #2, and apply it to #3 and find out who the cuckservatives and TRSodimites are
5. Continue searching for false positives, adjust the AI, and then feed it back to #1

Shia, it never gets old.

January 5, 2018 - 00:23

Carson Bennett

Well while we are on the subject, here is a slightly more detailed plan of pulling this off
>>>Holla Forums848319
This is Holla Forums Op after all, might as well

January 5, 2018 - 01:19

Juan Gray

Consider >>>Holla Forums848397 >>>Holla Forums848513

January 5, 2018 - 11:06

Jack Myers

Ops >>>Holla Forums848513

January 5, 2018 - 11:06

Isaiah Williams

Do you have an archive?

January 6, 2018 - 00:13

Gabriel Scott

Goal: To create a Neural Network that identifies friends and foes
Stage 1: Initial data collection
- Collect list of ~3K antifa Twitter accounts
- collect list of ~10K non-antifa Twitter accounts
Stage 2: Dump that into a antifa classification network
- Possibly use makeadverbsgreatagain.us/twitter-ml-2017-04-29.zip
- Feed in the data to a very powerful computer for training
- The code will scrape twitter accounts to see what they say
Stage 3: Finding of false positives and false negatives
- Gather a new set of antifa vs non-antifa accounts
- Input it into the machine to see what will happen
- Get the false positives/negatives to retrain the network
Stage 4: Antifa keyword sentiment analysis
- Using a sentiment classification network like the ones below
- github.com/danielegrattarola/twitter-sentiment-cnn
- github.com/xiaohan2012/twitter-sent-dnn
- See what keywords are positive and negative to them
Stage 5: Non-antifa group sentiment clustering
- Check the non-antifa list to see their sentiment to the keywords
- Use clustering algorithms to see which people have similar sentiments
- Interpret the clusters and add applicable labels to it
Stage 6: New group data classification and correction
- Manually check the result cluster for false positives/negatives
- Retrain the network for the clustering false positives/negatives
- Simply apply the result of 4~6, and apply the techniques in 1~3
- More indexes for discovering friend and foe

January 6, 2018 - 00:23

Mason Morgan

>>>Holla Forums848928
Might as well start another one.

Also
See

January 6, 2018 - 06:27

Camden Hill

Would it be possible to teach it to identify antifa based only on what memes they post and other complex inputs?

January 6, 2018 - 07:50

Christian Williams

Text memes can be easily parsed (see all the githubs)
Images will need complex Neural Networks to do it.
Here is what a simple Neural Network can do:
1. Predict whether a person is in a group or not (text classification)
2. Find keywords that are associated with a group (keyword analysis)
3. See how a person reacts to certain keywords (sentiment prediction)
Here is what a simple Neural Network CAN'T do:
Guess what an image is supposed to be (image classification)

January 6, 2018 - 08:21

Landon Rivera

You can probably get a pretty decent classifier just using Naive Bayes classification and a large word dictionary. There are some very uncommon words ("shibboleth" comes to mind) that pretty much only lefties use.

Docs: scikit-learn.org/stable/modules/naive_bayes.html
Tutorial: scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

I don't think you really need that much processing power for this kind of problem, since its text that's being processed. I trained a pretty decent race/gender classifier (75% accuracy on 8 classes) on only 8K images (faces only) by taking features from VGGface and training 3 fully connected layers on top. I think it took about 3-5 seconds per epoch on a GeForce GTX 750 Ti. CPU training was about 10x as long. I'm definitely not using expensive hardware, though.

Even if you did needed a rig with a lot of ass behind it, Amazon GPU clusters can be rented for pretty cheap ($0.5 - $15/hr).

If I were to do this myself, I'd probably use a small set of handpicked lefties, train a Naive Bayes classifier on that. If that's better than 85% on data in the wild, I might stop there. Otherwise, I'd use that to select a much larger number of samples (10K or more), grab a few beers and divvy up the profiles between a few friends, weed out the false positives, and then train on what's left.

CNN example: blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html

I hope that's helpful.

January 6, 2018 - 09:52

Xavier Cruz

So basically a spam filter? Is it that easy? I don't know.
GloVe's Twitter Dataset is 2 years old, and a lot has changed.
How can we actually feed in new words to update the dataset?
The closest thing to 2016/7 is SB, but it has less data
spinningbytes.com/resources/embeddings/
Should I actually create my own pre-trained dataset?

January 6, 2018 - 11:49

Wyatt Bennett

github.com/pinkeshbadjatiya/twitter-hatespeech
github.com/t-davidson/hate-speech-and-offensive-language

January 6, 2018 - 13:26

Julian Lee

...

January 6, 2018 - 14:00

James Wood

Well the antifa sticky is more accurate than that piece of work.
That means we could be better than them, digital warfare wise.

January 6, 2018 - 14:20

Nathan Perry

Yeah, Naive Bayes is favored in spam filters because it produces pretty good results without the need for a large dataset or a GPU to be viable. You could. I guess the question is do you want to be 80% accurate and be done this weekend or 95% accurate and be done in some number of months? I'd say the time cost of NB is low enough to be worth trying.

I think if you want to update GloVe, you might be able to load the model and existing weights, train on new data to shift and save the resulting weights. There's nothing stopping you from using the same model and training from the ground up on new data; I'm not sure how much time you want to put into something like that, though.

Ha! So all this machine learning magic is smoke and mirrors to reinforce the nonsense that shitlibs already believe? Sounds about right.

January 6, 2018 - 14:44

Wyatt Butler

As I discovered in another thread, those scary niggas show up every now and then here.

January 6, 2018 - 15:10

Zachary Powell

Secondary killed Holla Forums thread archives
archive.fo/TcTwY

Naive Bayes it is. ASAP

January 6, 2018 - 20:31

Connor Ward

Technical problem:
If I were to train the Bayes tweet-wise, and I only know which accounts are antifa,
that means many tweets that are not related to antifa e.g. casual chat would build false positives.
Could the solution be that we concatenate all tweet into a single line and parse that instead?
It might be slower to train, but the accuracy sure will be higher.
And applying cross-account metrics with tweets would double the accuracy.
The library I will be using is github.com/perkinslr/antispam for the Bayes,
and github.com/haccer/tweep for the Twitter scraping

January 7, 2018 - 00:24

Easton Gray

my.mixtape.moe/dueokr.py

January 7, 2018 - 03:06

Brayden Nguyen

or phrases too. do some markov chain garbage to make things fun.
trawl the links on >>>Holla Forums518058 for other connected communities maybe?
also, we could trawl leftist and liberal subrebbits (it's hard to tell when it's just liberals larping as leftists). Also, raddle.me and the chapofags.

January 7, 2018 - 15:23

Xavier Cooper

Good idea, take a look at for the source code, and see if you can modify it for Holla Forums and reddit as well.
Sample function: github.com/bluquar/reddit_scraper

January 7, 2018 - 20:55

Joseph Jenkins

Dumb estimations:
Every tweet is about 50~60 words long (assuming the current 280 character limit),
How many tweets do I have to capture to be assured that someone is an antifa?
What I might do is to concatenate the tweets into one string as training data.

January 8, 2018 - 10:12

Xavier Perez

A Markov chain counts as a neural network?

January 9, 2018 - 00:47

Hudson Diaz

Yes, but a really complex and computationally hard one. What do you want that for?
If you want to generate fake antifa tweets, it might work… but it wastes a lot of energy.

January 9, 2018 - 01:21

Owen Perez

Fucking normies don't even know what it means anymore. Normies ruined internet, they ruin every place if allowed in.

January 9, 2018 - 02:57

Angel Phillips

No, it's more important to filter out anti white propagandists. This includes whites themselves as well.

There's much hype about AI, I think the following classification algorithms would work way better if you can make them see things as an ontology. This way you can run a reasoner which can show a lot of inferences and make things a lot better overall (and reduce false positves / negatives).

If you make an anti white 'database' an ontology would be preferable because you often want to hold the opponent accountable for something 'anti white' and that is where an ontology rather than a formal database might be much better for. It's certainly difficult and it depends a lot on the design choices you make but I believe it would be very beneficial for the right wing to be able to attack like this.

There's so much research that can be done in this area that hasn't really been explored yet. Data scientists are just starting with it.

January 9, 2018 - 03:01

David Perry

Let me get this straight, you want me to scan a twitter account to see how they came to be?
That is very hard to do, if you understand how neural networks function.
You take a set of data, divide it into subsets, and you want the net to classify subsets properly.
It uses a lot of data filters and interconnected neuron layers to do it.
You can't just dump data and expect them to understand "CONTEXT".
Could you help me decompose the problem into sub-problems that we can tackle?

January 9, 2018 - 07:16

Landon Ward

Made some more mock-up,
No more Bayes, more Bag-of-Words
pastebin.com/XqDtyQUD

January 9, 2018 - 11:33

Jace Ramirez

List of problems we need to deal with:
Original plan
Sauces
History size
Website additions
Holla Forums
Needed sources for
For jewdar

January 10, 2018 - 03:27

Robert Martin

Where is this from?

January 15, 2018 - 17:40

Joseph Jackson

They all gnu what they were getting into when they used free code.

January 15, 2018 - 18:37

Brandon Clark

I learned they hard way they support (((free speech))) and not free speech.

January 15, 2018 - 18:39

Isaiah Jackson

But does it work

January 15, 2018 - 20:48

Adam Clark

Why would the Jewish owner and the Muslim mod of Holla Forums put a swastika in the page?

Why do White Nationalists want to kill White people who wish to meet up without them?

8ch.net/polmeta/res/28806.html

Why would Holla Forums ban a White meet up?

Why are Whites the only ethnic group that do not meet up and network?

Jewish networking

Black networking

Hispanic Networking

Asian Networking

Indian Networking

Muslim Networking

vs.

White Networking

archive.is/LEhxc

STOP-POL-CENSORSHIP

Remove the Holla Forums mod

Censorship on/pol/ is worse than any of the social media sites.

FISH BOL

There is a war against Whites being currently waged. It's time to start fighting back.

White Nationalism is the cause of "White Genocide".kjg hg

January 15, 2018 - 21:04

Hunter Flores

This guy gets it.

How?

This is not a one-man project, the mock-up code is done, all we need is for someone to try it on GPUs and twitter userlists.

January 15, 2018 - 21:09

Christian Kelly

Say no more fam

archive.is/UuJHI
archive.is/A7nmw
archive.is/5X0Pl

Total ~200K people, no list is perfect, there are some Trump supporters on here but that is part of any good algorithm.

archive.is/XJMCb

January 16, 2018 - 03:16

Lucas Barnes

whoops, replied to wrong person

January 16, 2018 - 03:18

Brody Turner

Mind me if I ask, what about a list of /ourguys/? (10k users per 3k antifa)
e.g.1 justpaste.it/1epdt_
e.g.2 blocktogether.org/show-blocks/V3Zsg8lpBfUFrSLZjjI9o9ypW9SYb_pd4enH_GxC

January 16, 2018 - 04:47

Kevin Mitchell

8ch.net/pol/res/11158236.html#q11158562

January 16, 2018 - 21:15

Jaxson Anderson

**V3Zsg8lpBfUFrSLZjjI9o9ypW9SYb_pd4enH_GxC ("Gamergate" by @MyActivism)
TJl_eDUojyl7wgjZOaRDTBqQTgamC4_Z3PJo1ONW ("nazi" by @NaziBlocker)
G30Q3zrAw02LNkHH5vuMBWR1LtR_fIVCV1kT9XMT ("alt-right" by @BlockAltRight)
UQ_ZPDyCHCygI-EUU_6xLY23sewTWFbPA8k7cCdz ("fake antifa" by @AntifaChecker)
gzr1bGk_c7c1vROD93ASM4zPPdOpqfUiZBkhEs9R ("alt-furry" by @AltFurryBlocker)
18uLPN0sBzVsRIeHRxCCTcI2diW6urzbnNNQniNx ("Mr. Robot cameo" by @th3j35t3r)**
Use blocktogether.org/show-blocks/${id} for the blocklist codes

A list of antifas are in

January 21, 2018 - 03:16

Jordan Perry

They used Bayesian backpropagation?

January 21, 2018 - 03:45

Noah Sanchez

What's the goal of this?

January 21, 2018 - 04:24

Daniel Jenkins

I assume finding antifa faggots by identifying their texts. When it gets good enough, you can ie feed it campus students and it will spit out who is an who is not antifa there. For instance

January 21, 2018 - 04:31

Elijah Flores

what should spook you a bit, is when you do the reverse of this. That is, you find a person's online texts (twitter, facebook, blogs, shitposts, papers written).

You train another AI to identify the person's text, running it now against collected antifa data to unveil ie the person's antifa accounts on twitter or whatever.

I don't got experience with texts, I deal discrete numbers

January 21, 2018 - 04:35

Ryan Garcia

What you are referring to is called "stylometry" en.wikipedia.org/wiki/Stylometry
It takes a lot of data and database to pull off basic writing style fingerprinting.
Most Python libraries for such a task stopping developing since 2015.
Also this exists github.com/psal/anonymouth

January 21, 2018 - 19:23

Kevin Ramirez

Blogs that tackles the issues
aicbt.com/authorship-attribution/
dariah-de.github.io/DARIAH-DKPro-Wrapper/tutorial.html
Other resources
github.com/jpotts18/stylometry
github.com/worldwise001/stylometry
github.com/pagea/unstyle
github.com/usc-isi-i2/dig-stylometry
github.com/paper82/pylometry

January 21, 2018 - 19:34

Blake Clark

More Resources:
github.com/mikekestemont/pystyl
kaggle.com/christopher22/stylometry-identify-authors-by-sentence-structure

January 21, 2018 - 19:47

Elijah Wood

The original direct naive bayes and Bag-of-words model

Using neural network with Bag-of-words

I would not describe it like that. Most non-NN solutions uses Word Vectors, Decision Trees or Direct Naive Bayes.

January 29, 2018 - 23:12

1 2 ... 8 Next

Potential antifa detector

Last threads