Autopurger

I wrote a shell script for automatically deleting CP spam when it's reported. Before I start using it, I want to check if anybody can find a way to make it misbehave.

It works like this:
- Look through the report queue for a banned regex (typically the URL shortener they use)
- If it's there, use the JSON interface to check the OPs of all reported threads for the regex
- If a thread has the regex in its OP, delete it

gitgud.io/ring/infinity-mod-cli/raw/master/autopurger

Other urls found in this thread:

tr.im/.
8ch.net/tech/res/580899.json
8ch.net/settings.php?board=just
gitgud.io/ring/infinity-mod-cli/raw/master/ocr-purger

ring ring

Holy shit that image made me crack up.

What's to stop me from just reporting a thread I don't like as CP?

What's the plan for when they just use a different shortener? Add them to the banlist one by one?

It checks the text.

Fuck, I almost reported this thread

What are you planning on using for said regex?

How do I add this on Holla Forums?

For now, tr.im/. I can add other domains with \| as a separator using grep basic regex syntax.

If they stop using URLs in the post message (they did that with Lynxchan) I'll experiment with command line OCR tools. OCR got rid of them on Lynxhub and Endchan.


You run it with a cronjob on a server. If you can get me a volunteer account for Holla Forums I can run it there once it's tested.

I've been wanting to implement something like this to offer up for months and was too lazy to get to work. Thank you for this so much.

The problem with this is when they don't post any link shorteners in the text, only in the image. It happens a lot, if you recall correctly.

Also this solution is only working around the fact that the site is so fucking broken that you can't moderate it properly, and they apparently have no interest in fixing it, even going to far to implement an overboard out of the site's scope.

You wouldn't be the first one trying to crash this picture.

What a hot head.
BLO BLO BLO BLO BLO BLO BLO BLO

I noticed /just/ often has the spam links broken. You might want to get in touch with its BO for some common link shorteners.

That's what OCR is for. Text recognition in images. For example:
$ tesseract topbane.jpeg stdout Topbane . ruPSHC,PSHC-Big Guys Videos4U Yo - Mosqui‘ro Men

With JSON:
curl -s 8ch.net/tech/res/580899.json | jq -j '"media.8ch.net/tech/src/",.posts[0].tim,.posts[0].ext' | xargs curl -s | tesseract stdin stdout


Thank you. I'll get them from here:
8ch.net/settings.php?board=just

...

You're also relying on people to report every single image because the site is so fucking broken.

...

I don't think they'd put that much effort in, after awhile they'd just give up.

I could detect it without reports, but then the worst-case scenario for a bug would be deleting every single thread in the catalog, and it would save less than five minutes on average.

it's no effort really, it's called imagemagick.

No, the proper way to totally nigger rig this as a gvol is to check reports, see if it matches, find all posts by global ID (optionall check if it matches) and delete them all one by one.

Why the fuck not, we're already paving over every other deep flaw in this fucking software, might as well work around this because hotwheels can't have 2ch's database help with a big table migration.

database guys*

I wrote a OCR checker that checks all new threads. I haven't tested it on real spam images yet because I don't have any, but it works with the topbane fake.

gitgud.io/ring/infinity-mod-cli/raw/master/ocr-purger

Will you be rolling this out to the entire site or just this board?

Just this board. To run it out to the entire site I'd need a global volunteer account. If this works I'll ask about that.

The OCR bot is now live.