My country has online phone book

God damn it, I am fucking tired of fucking around with this shit today.
I know this pre-historic captcha type was already cracked by Google itself just with clever algorithms and no neural net but come on, just let a guy live for Christs sake

Will probably continue tommorow

What did you do today?

Other urls found in this thread:

papers.nips.cc/paper/2571-using-machine-learning-to-break-visual-human-interaction-proofs-hips.pdf
8ch.net/tech/res/795341.html
youtube.com/watch?v=NNlfmLt1j7g
twitter.com/NSFWRedditVideo

btw, these are the kind of captchas that turn up if anyone is wondering

...

Hire some pajeets to enter captcha for you or try changing IPs

add sum proxies :^)

anyway why are you trying to build this database

How many captchas do you need to solve to complete the scan? If it's only like 1k or something, that's only a couple bux, just pay for it.

It is probably triggered by you sending too many requests in short timespan. Try adding some delay between your requests.

lmao thats nothing there's freely available python code that bypasses just this

you fucking moron why do you there is captcha in the first place ?

10 million possible numbers on just one extension, and there are like 20 or 30 extensions for all different regions for landlines and cellphone providers
I could easily wardrive the landlines and then look them up? But paying seems absurdly big waste of money.

Not to mention that this is just a coding challenge per say.

yea, probably thought something like that might help.
That would be easiest solution.
I read a paper today on how these captchas were broken, at least partially or with small percentage of success
papers.nips.cc/paper/2571-using-machine-learning-to-break-visual-human-interaction-proofs-hips.pdf

Atlhough I believe that they have done this hastily and that this could be done better.

Another method is implementing a deep convolutional neural net which I have a hard on for doing it.
(I think I read a paper long time ago how Google succeeded in solving captachas without segmenting symbols and using image manipulation but with pure neural net)


I thought actually that the captachas would be just one off thing, but seems like once you trigger them then they continue to pop up.
Moreover, it isn't triggered by too many requests in short timespan but too many requests in general. A normal person wouldn't look up 100 numbers one after another, no matter what amount of time it takes them.
I guess I will start with changing my IP every few dozen tries and then on side fool around with neural net.


sure thing buddy


I think you're missing a verb there


Not really, just dissapointed at amount of pay off after fixing one bug after another for 3 or 4 hours, all ending with a cockblock.
Also, public info so it isn't technically illegal ;)

but I sincerely wanted to hear what you guys have been doing

Why did you use selenium? Seems overkill/the wrong tool for the job.
Anyways i wrote a python script to download images from nhentai only to discover my old perl script is 1000x times faster than this snakeshit

I didn't want to go down to the nitty gritty of low level communication with site over socket and honestly don't know what other method is out there.

Speed isn't really my goal at the moment (I just want to see some results), but I will gladly take suggestions on how to speed it up.

What did you use user?

for the python shit? just bs/requests but i dont understand how its so much slower than perl

sudo nano /etc/proxychains.conf

comment strict chains

un comment random

scrap tons of proxies

save

proxychains python yourshit.py

keep us posted if you need a ton of proxies let me know. if you want to do it yourself look up 'proxybroker' on github or just get a proxy scraper.

8ch.net/tech/res/795341.html

fuck off frogposter

what am i reading?

obviously you can't read

did you just call me a cuck?

10 million possible numbers (under one extension) divided by 10 numbers (I counted today) I will get under every IP before captcha kicks in gives me about 1 million IPs I need to cycle through to go through all 10 mill numbers.
Absurdly big number.
However, I have seen that after 30 minutes captcha goes away, which gives me idea that I could cycle through few thousand IPs before captcha cooldown for first IP kicks in I could cycle back to that IP, meaning I could reuse a list of IPs again and again?
Seems like the best option atm.

I will keep you guys posted, this really is an interesting project.


Next time I will write my shit in requests or beautifulsoup it seems. Thanks user very much.

just a quick question
does anyone know what this js means?
$(function(){ $('#kaptchaImage').click(function () { $(this).attr('src', 'kaptcha.jpg?' + Math.floor(Math.random()*1000)); }) });

Does that mean that there are only 1000 captchas at their disposal?
Does that mean that this js gives me ability to choose what captcha I solve?

Install Greasemonkey, and write a script to set the captcha to the image source to a specific number in that range. If you find that it indeed is loading the same image every time, you're golden.

tried it, nah it seems like kaptcha.jpg?+number is just some kind of internal code for "give me this new captcha"
It seems to spit out new captcha with same request
Although, found a source code for this captcha on some shitty chink site so I will look into how it works and try proxychain stuff with dynamic_chain instead of random one

Have you tried disabling JS in browser while scraping?

This JS turns on only when you click the captcha image, so that you get a new captcha.

Also, disabling JS on the site will not allow me to search for numbers :/

Are you fucking retarded? If it's using JS to search for numbers, and is also probably using JS to enforce rate limiting, then just find the endpoint it's hitting, you fucking moron.
God damn, no wonder you're a third world nigger frog poster.

only other JS document in POST is adex.dotmetrics and DeviceInfo.dotmetrics both of which use devide id and session id to identify me.
I guess you meant that? Doesn't make any sense since my did and sid is both tethered to my IP, right?

no need to calling each other names

Man, it would be a lot easier for us to help you if you just posted link to website that you are trying to scrape instead of leaving us to guess where problems might be.

Try using googlebot as user agent.

a quick update
I have downloaded a couple of proxy txt files containing thousands of proxies that don't work and made a quick python script to test if some of these are working
by my calculations 60 proxies should be enough for cycle of 30 mins to work, although more than 60 are highly desirable

polite sage cuz this is a shit update

oh geez, this is going to be fucking fun

From all the proxies I wasted my time on, all of them don't work.
Those that "do" I tried using and don't spit out the site I am trying to reach.
When I looked more into it (just few first proxies) they either redirected me on facebook pages, webmail pages, "This page is controlled by Teamwiever" page or even this:
Diese neue Domain wurde im Kundenauftrag registriertWarum wird diese Seite angezeigt?Diese Seite wurde automatisch erstellt. Sie wird bei jeder neuen Domain hinterlegt und zeigt, dass die neue Domain erreichbar ist.Ohne diese Platzhalter-Seite würden Besucher eine Fehlermeldung erhalten. Als Kunde von united-domains können Sie diese Domain in Ihrem Domain-Portfolio jederzeit selbst online konfigurieren (z.B. Web-Weiterleitungen, E-Mail-Einstellungen, Webspace hinzubuchen, DNS-Einträge ändern).united-domains - Die besten Adressen fürs WebWeitere Domains günstig registrierenNeue Domain-Endungen vorbestellenImpressum© united-domains AG. Alle Rechte vorbehalten.

Which I assume doesn't smell too good.
Anywho, does anyone know where I can get 100 reliable proxies? Don't mind if you guys do a Mitm attack on me, just want this project to be over tbh

It's a placeholder page.

Did you even try changing your useragent? Why are you so hell bent on this proxy scheme?

Post the shitty website if you want advice, user. If you were doing this shit from your actual IP, they already know. Or at least dump the form you think is requesting numbers.

bumping the thread, coming back to my coding
still have to get something like 100 proxies to succeed in my plan
Feeling like sisyphus a little, a song to acompany me in these dark moments:

youtube.com/watch?v=NNlfmLt1j7g