Compression

Time for a compression thread. What do you use, for what scenario and for what reasons. I'll start with some benchmarks:
du -h linux-4.4.6-gentoo.tar619M linux-4.4.6-gentoo.tarzip linux.zip linux-4.4.6-gentoo.tar 24.49s user 0.18s system 99% cpu 24.678 total131M linux.zipgzip -k linux.gz linux-4.4.6-gentoo.tar 24.56s user 0.18s system 99% cpu 24.733 total131M linux-4.4.6-gentoo.tar.gzbzip2 -k linux-4.4.6-gentoo.tar 56.86s user 0.21s system 99% cpu 57.077 total102M linux-4.4.6-gentoo.tar.bz2xz -k -T 0 linux-4.4.6-gentoo.tar 309.17s user 0.89s system 709% cpu 43.676 total88M linux-4.4.6-gentoo.tar.xzzpaq add linux-4.4.6-gentoo.tar.zpaq linux-4.4.6-gentoo.tar -threads 8 -method 2 229.20s user 2.42s system 556% cpu 41.606 total108M linux-4.4.6-gentoo.tar.zpaqzpaq add linux-4.4.6-gentoo.tar.zpaq linux-4.4.6-gentoo.tar -threads 8 -method 3 336.18s user 2.42s system 551% cpu 1:01.39 total88M linux-4.4.6-gentoo.tar.zpaq7z a linux-4.4.6-gentoo.tar.7z linux-4.4.6-gentoo.tar 285.35s user 1.85s system 498% cpu 57.619 total88M linux-4.4.6-gentoo.tar.7zlrzip linux-4.4.6-gentoo.tar 304.75s user 3.85s system 566% cpu 54.503 total89M linux-4.4.6-gentoo.tar.lrzlrzip -g linux-4.4.6-gentoo.tar 36.43s user 2.08s system 284% cpu 13.546 total121M linux-4.4.6-gentoo.tar.lrzlrzip -z linux-4.4.6-gentoo.tar 643.04s user 3.35s system 640% cpu 1:40.98 total70M linux-4.4.6-gentoo.tar.lrzcmix/cmix -c linux-4.4.6-gentoo.tar linux-4.4.6-gentoo.tar.cmix 77.65s user 6.39s system 99% cpu 1:24.06 total
I tried cmix, but Linux OOM-killed it. I have 16GB of RAM.
Made on tmpfs, with a 8350 and gentoo (thus everything compiled with -O2 -march=native)
mattmahoney.net/dc/text.html (compression time should be ignored, as the machine isn't fixed)

My go-to are usually lrzip -z for archiving/backup and xz for everything else (even Windows babbies can decompress it, as 7z can).

Other urls found in this thread:

en.wikipedia.org/wiki/Rzip
github.com/philipl/pifs
twitter.com/SFWRedditGifs

zip or gzip for compatibility, xz for everything else. If I ever need really fast compression (to save bandwidth or something) I'll use lzop.

Well, I suggest you into lrzip -l or -g for sanic cases.

Doesn't that just use lzop?

By the way, anyone else surprised that free software doesn't seem to win at all, for once?

Yes, but with a very fast multithreaded implementation and deduplication.

Debian's lrzip package depends on the same lzo library as the lzop package.

Does it do deduplication when you're looking for extreme speed?

en.wikipedia.org/wiki/Rzip
Lrzip is just a better rzip (LZMA rzip).

...

Do any of you use FLIF for your images? I'd like to give it a shot. How does it compare to maximum compressed farbfelds?

πfs of course, the only 100% density compression for any size.
github.com/philipl/pifs

I really love concepts like these, its all mathematically sound too, its probably the slowest decompressor of all time though

Mathematically sound, but it doesn't actually compress your data.

Is it possible that there's better numbers than π for this?

Yes.

Guaranteed compression is impossible (I can post a proof if you want), and pi isn't "optimized" to make sequences of digits that are more common in the real world more common in the number. It also doesn't avoid repetition.

But pifs only stores single bytes, so unless you change that, it doesn't matter.

You have the same result as xz but slower. Why use it then?

ZPAQ, of course.

But it's slow and not that good compared to LZMA2.

For networking I use LZ4. I used to use Snappy but that code is a shitpile. No one at Google knows what they're doing anymore except the Chrome team.

pigz

when i run out of space i just buy another hard drive. for everything else there's dtrx

It's just an extractor, m8. Do you mean you don't compress anything? Some stuff like ISOs can really benefit from good compression.

compression is for poorfags who can't afford more storage space. prove me wrong

Storing files in Pi is simple. Just find a sequence of numbers that represent your file and mark its location. Sure the location data might make it not 100% per se but pretty close.

This is based on the assumption that pi is a normal number. That has yet to be proven.

This is revolutionary.

...

ZPAQ for everything, especially backups since it's journaled

-m 4 for general data since it's powerful and ignores already compressed data

-m 5 for best compression if I have lots of time

On average, the location data will be as large as what you store, or maybe a bit more. That's easy to prove.

A compression method can be seen as a mathematical function. You put a number in (representing a string of bits) and you get another number out (representing a different string of bits) that can be decompressed to get the original number.

Imagine we have a guaranteed compression algorithm. It doesn't matter what you put in, you always get something smaller back. So if you compress a file of 1000 bits or less, you get a file of 999 bits or less.

There are 2^1001 - 1 files of 1000 bits or less. There are 2^1000 - 1 files of 999 bits or less. So if you compress all files of 1000 bits or less, you must get some duplicates. You can't properly decompress those duplicates, because there are multiple files that they could have been the result of. This means that guaranteed compression is impossible.

Real compression algorithms manage to compress almost everything you put in because most real files have redundancy. For example, most characters in a plain text file are letters or punctuation, and the same words are repeated multiple times. Compression algorithms know how to spot patterns and exploit them. But π doesn't know. π doesn't have any repetition that's useful for this. Random data will do just as badly in πfs as real data.

This is how gzip handles random data:
$ dd if=/dev/urandom bs=128 count=128 of=rand-data 128+0 records in128+0 records out16384 bytes (16 kB, 16 KiB) copied, 0.00219353 s, 7.5 MB/s$ wc -c rand-data 16384 rand-data$ gzip rand-data$ wc -c rand-data.gz 16417 rand-data.gz

This just inspired me to make something like this based on AES256 in CTR mode. Since there's lots of hardware acceleration for that out there its probably the fastest way to calculate cryptographically psuedorandom numbers, which all we really need to make use of the Infinite Monkey Theorem.

My idea is to speed things up by making use of as much processing power that can be had on a system, so i'm going to use OpenCL, CUDA, and Beignet to make use of GPU SMPs as well as normal CPU threads.

Each thread will start with a unique(in the context of the other thread) seed value, and then begin comparing the psuedorandom bytes with the uncompressed file. Then once a match is found compression of the seed/counter index will follow until a absolute minimum is found. During the seed/counter compression stage the depth of the compression is also accounted for as well.

poorfag

some people don't live in cities or anywhere near a backbone.

also nice devil trips.

Baitu desu ne