Saving and Verifying the Internet

Question

Saving and Verifying the Internet

Anthony Torres

unless i'm wrong trying to use SSL to verify this archive data, if it was saved encrypted, is also a dead end. you could record the client's pre-master or session keys, but it's pointless. the client ultimately generates the encryption keys, then uses the public cert to send it to the server. there's no way to verify that the encrypted data actually came from the server, just that the data is using the key.

Attached: deadend.png (993x1294, 168.63K)

March 17, 2018 - 15:11

Grayson Cox

content based addressing is an old idea and already sovles this. IETF are making some official goy friendly version of this over the last decades and soon it will replace the web

no, blockchain isn't even applicable

March 17, 2018 - 22:05

Austin Gomez

This is an interesting side discussion, but it doesn't strictly invalidate the utility of OP's clever idea, even if you realize you'd only be able to hash the same archive once.

For instance:
This would allow something like archive.is to exist, too, by keeping the hash Db separate from a Db of full archives, which (depending on takedowns) could consist of anything from a normal website like archive.is to a cloud of torrents or a blockchain.

March 17, 2018 - 22:47

Eli Smith

Attached: le reddit spacing meme.png (652x2245, 420.45K)

March 17, 2018 - 23:18

Aiden Baker

I think I am going to write some proof of concept code for the simpler alternative method, none of the code is revolutionary here, probably literally use wget to pull the site archive.

I came up with a few extra options today:
-Instead of providing the archive, provide diff files
----Client hashes all binary files, sends file listing and hashes to server (this is going to require a custom client).
----Client sends full text/html files to server.
----Server checks binary hashes for mismatches, missing files, extra files, etc
----Server run's diff on text/html files, generates diff and patch files
----Server sends back binary hash mismatch data, along with text/html diff and patch files
----Client creates a single tar archive with client's original archive + mismatch file + diff/patch files, hashes it
----Server does the same, hashes should match, it has the same data as the client now.
----Server store's this hash for verification.
Advantages:
-The client can either patch the text/html files to generate a fucked copy or not, the archive should be verified either way.
-The server doesn't provide illegal content to the client in case the client is an asshole, which is going to happen.
-The server isn't providing the website file's directly, seems like should be less DMCA-able.
Disadvantages:
-Even if the hashes aren't, the diff files on the text/html source are probably considered a derivative work, it only provides them during the exchange, there's nothing to DMCA but the hashes, but during the exchange they might be able to claim there's copyright infringement going on.

-Instead of providing hashes, sign the archive with GPG
Advantages:
-Hashes can't be DMCA'd, they don't exist, the verification isn't with the site itself, it's with a 3rd party keyserver.
-If the site goes down, the archives can still be verified.
Disadvantages:
-Server must provide the full archive, signed. If the diff method above is used, it would have to return the full archive back to the client instead of just the diffs/mismatch file.

March 17, 2018 - 23:43

Lincoln Cox

How could copyright violation be said to occur during the window in which a server-side archiver would provide data? That would also be during the time when whatever's being archived is still publicly accessible (otherwise the server obviously couldn't see it). Has, for instance, a proxy server been prosecuted for some reason?

Also, I'm not seeing how a diff would be useful for the primary purpose of verifying that a shady random archive file is an authentic copy of something that a certain URL served at some point in time.

March 18, 2018 - 01:18

Nicholas Cooper

tangential, but mozilla is going to delet all non-quantum addons soon; maybe we can fix that somehow.

March 18, 2018 - 02:04

Caleb Hill

I'm not sure, I'm assuming the kike lawyers could come up with practically anything, and think of all the sites that have a ToS 30 pages long. The archive sites and google have lawyers that can smash bs immediately, whoever hosts this system likely will not. This isn't even taking fair use into account, or the lawyer speak google and the archive sites use: "direct agent of a human user"

The goal of this system should be that even if forced to comply with a DMCA, it can still be verified. Complying could mean blocking future access to whatever, hopefully not removing the hash, but gpg would solve that problem. That and minimizing as much as possible, what might be considered copyrighted to begin with, diff's and hash mismatches are less than a full archive of the site coming from the server.

I'm really not confident though that the extra back and forth between client and server to generate these diffs/binary hash mismatches would even be worth it. An archive a server sends out is clearly copyrighted content, but the diff/binary hash could definitely be considered a derivative work, which is the exact same problem if anyone gets pissy. The big benefit here I think is illegal content. It seems like it would be easier to defend the system if a user uploaded the illegal content themselves as opposed to the site reaching out and grabbing it on behalf of the user, but at the same time, it would be doing it anyway to generate the diffs, it just wouldn't be giving it's copy back to the user, it would be sending the user's copy back, maybe this distinction is pointless, I don't know.

The diff by itself wouldn't be. The diff + the user's original archive, combined into a single file and then hashed or signed with gpg would. The purpose of the diff + binary mismatch information would be to avoid the site having to provide the archive at all, the users themselves would submit it. The server would still generate it's own copy, but the user would only get back diff's and mismatches, and with gpg signing, a copy back of the archive they sent to the server.

The html/txt diff's and/or patch files combined with the original client's html/txt, would produce the servers version. Binary files, images/videos, would simply not be provided, it would just be indicated that these files could not be verified due to a hash mismatch.

March 18, 2018 - 02:05

Ayden Bell

ya i think this is beyond using another add-on. it's either going to work with no client side at all, in the case of the server providing the full signed/hashed archive, or require a client that's going to have to be dedicated, it'll require more permissions than webext allows. I don't think webext can modify files. You might be able to do it with native messaging, but at that point you still have to install a binary on the client side's system, with the additional requirement of the webext being certified kosher by mozilla/google. It seems stupid at that point, the client side shouldn't require a browser webext front-end to operate in that case, it should just operate independently.

March 18, 2018 - 02:09

Jacob Bennett

Again, this is should be legally identical to a proxy, and I'm not aware of proxies ever having been held liable for the actions of their users.
I don't think so. This isn't even like a magnet URI, where it can be used to find a file copy, its only possible use is to verify a file's provenance.
Eh, it might be useful for some or other reason, like server bandwidth savings.
Ah, got it.

March 18, 2018 - 02:48

1 ... 3 4 5 Next

Saving and Verifying the Internet

Last threads