I am building a search engine

Located at wiby.me. Had made a thread a couple months ago, this will be my final update. I want to rebuild the web to be like the 90's / early 00's where pages weren't so bloated and were based more on a subject of interest rather than making money. I am trying to gather as many of these kinds of pages and if you know any, please submit them.

It used to be called wibr but have changed it to wiby because its easier to sound out. The search engine will now crawl indexed pages on a weekly basis to keep updated.

Some server info: This is a lamp server hosted on a VPS with an SSD. The crawler and update scheduler I wrote in C. That's about 60kb compiled and does the job well. The site is usually pretty snappy but there isn't may users either.

Anyhoo I will keep trying to find old-school style pages to keep it growing even if there aren't many submissions, although I've had a good number of contributions.

Other urls found in this thread:

mdpub.com/
twitter.com/AnonBabble

Can't you make a ranking system based on how little a site relies on JS, how many KB the site is, etc.? Google does that to influence search results to favor sites that use alt. text on images and stuff like that. Could help to distinguish your search engine to those who prefer sites to just do their fucking job of displaying formatted text and small images.

I do have a way of checking for HTML 1.0 style pages, not a ranking system but it works OK. The other thing is that it only crawls the page you submit. The advantage is people are likely to submit their best pages rather than some irrelevant sub-page. Google tries to crawl everything possible and so they have to come up with a really complicated way of sifting through all that and bring forth the "good" pages. I think they are failing at that, but I will say they are unmatched at finding any answer to a technical question you may search for.

If you search for 'dog' on google, you'll probably end up getting a wiki article, some mainstream news articles, some SPCA pages, and some ad-filled cancerous pages. On my search engine when you search 'dog' it will mainly show personal pages written by hobbiests. Which is more reminiscent of the earlier internet.

ANd admitidely I dont have many pages about 'dogs', but you get what I mean

OK OP, that's a pretty awesome thing you did there.


Things I'd like to advice:
-Use Arial instead of Times New Roman, since it helps readability. Or hell, move everything to Fira Sans, but that'd be extra bloat, I think.
-Give us a repo with the site's code
-Use some sort of feature similar to DDG's bangs. I'm not kidding, they might even help your crawler get more info, and heck, it's nifty for us mere mortals as well.

THANK YOU

It really needs current results, as well. Tails, which is a wonderfully considered HTML OS document, is not a result and it should be.

tails.boum.org
Shit name.
Fake words are bullshit.

Well done, sir.

It needs a cached page link for every site

I'll play around with the fonts. Yeah Ariel is prob best. As far as giving a repo... im on the fence about it. If I lost interest I'd definitely give it up, but its nothing too fancy either. Using libcurl to download the page, then I wrote a parser for the html, and it throws it into the sql database.
I do like the ddg bang idea. Had to look that up.


I'll check out this 'tails.boum.org'.
wibr/y is the best I can do because all the damn words are taken by domain hijackers who want like $2000. At least its short and easy.
Yes I agree a cached link of some sort (even just raw text) would be nice.

Its 2:20am so going to bed now.

Try Liberation Seriph or Mono, dude

Don't index any page that requires javascript

I feel your pain, user.
but in the future it is already called by its true name

bastian

Speaking of fonts, could you make the rank algorithm punish sites that use Comic Sans (also: the blink tag, cookies)? Also this:

How about providing access using a Tor hidden service?

Hello Windows fag!


Please don't. Only define the generic family:
font-family: sans-seriffont-family: seriffont-family: monospace
Let the users set their beloved font in their browser.

ok boss

by the way, thanks to whomever was nice to submit all those pages.


Its at least got rsa encryption so your employer cant see what you're searching for. I'd have to learn more about tor first.

I just made a search engine for Japanese and it was a fuckton of work and took 6 months. Finally finished and the crawler is almost ready to go.
From one search engine user to another, what was your favorite part of writing the search engine?
For me, I enjoyed writing the indexer the most. It felt like black magic when the data structures and algorithms lined up and starting working.

Well you got the 90s part right at least.

My favorite part was probably the work on the crawler, and of course seeing everything come together like you said. I got a lot better writing in C and learning all about handling memory properly. And I hadn't done much with mysql or linux before this, so it was a great way to learn. What is your search engine called.

...

I somehow just lost more faith in humanity after being on Holla Forums for far too long.

...

Turn off adult content filtering then. I'm not sure there is furry porn though.

My bad, Sans Serif font for everything. I actually run Linux full time.

This search engine is incredible... Do you plan to release your source?

Thank you for making this. It really does feel like I'm using an older search engine.

noice!

If it's not comic sans then you can gtfo

Thanks user! I don't plan on releasing my source but have no problem answering questions about whats going on in the back end.

...

I'm lovin' it, OP. Just one note: the 'thanks for submitting' page has a link which still says 'wibr' instead of 'wiby'. Maybe grep through your source files if you want to carry through the name change.

Bless you, user. It's like surfing in the early 2000's again. Your search engine doesn't have a lot of results, but the results that do come up are a goldmine.

>>>mdpub.com/

stop, please

Why

Aw, and just when i thought we could be friends.

the code follows the GPL, so it's not proprietary, LARPer. This is exactly the situation that AGPL was supposed to prevent, though.

Nice catch.


What are the benefits of releasing the source? I can see that it will help people find bugs, and help others make their own search engines I guess....

fuck off Grsecurity scum

install gentoo

User confidence. After duckduckgo was cucked I don't trust any search engine now not to fuck me.
My question is are you using linux or *bsd for backend? Where is the server physically located(i.e can *insert tla here fuck it up easily*). Finally who are you and why should we trust you?

Fair argument. But I don't see how that makes it any more trustworthy. All a search engine (or any websites) would need to do is supply the surveilling 3rd party a ssh key and access to /var/log/apache2/access.log. Thats it.


Its a LAMP server. I'll be migrating from Apache to nginx soon. Heard its a lot faster with its microcaching feature.


I'm a leaf. Please spare me on the day of the rake. Why should you trust me? You should only trust your instincts, based on what I tell you and what I deliver you in results.


VPS is located in America. Though I am not American, I prefer to keep the server there where the strongest respect for free speech is legally upheld.

Answer me this and I will trust you. What is markmonitor LLC?

Thanks, OP!

What is with the (((nok sek agency's))) obsession with that man pictured in OP? It is in all their datamining threads too. Is it like a forced meme of a co or ex co-worker or something? A callsign for being faggots? idk

Never heard of it

I just thought the pic was funny

I'll link you on gopher.su if you don't mind OP.