Compression

Question

Compression

Jacob Adams

Time for a compression thread. What do you use, for what scenario and for what reasons. I'll start with some benchmarks:
du -h linux-4.4.6-gentoo.tar619M linux-4.4.6-gentoo.tarzip linux.zip linux-4.4.6-gentoo.tar 24.49s user 0.18s system 99% cpu 24.678 total131M linux.zipgzip -k linux.gz linux-4.4.6-gentoo.tar 24.56s user 0.18s system 99% cpu 24.733 total131M linux-4.4.6-gentoo.tar.gzbzip2 -k linux-4.4.6-gentoo.tar 56.86s user 0.21s system 99% cpu 57.077 total102M linux-4.4.6-gentoo.tar.bz2xz -k -T 0 linux-4.4.6-gentoo.tar 309.17s user 0.89s system 709% cpu 43.676 total88M linux-4.4.6-gentoo.tar.xzzpaq add linux-4.4.6-gentoo.tar.zpaq linux-4.4.6-gentoo.tar -threads 8 -method 2 229.20s user 2.42s system 556% cpu 41.606 total108M linux-4.4.6-gentoo.tar.zpaqzpaq add linux-4.4.6-gentoo.tar.zpaq linux-4.4.6-gentoo.tar -threads 8 -method 3 336.18s user 2.42s system 551% cpu 1:01.39 total88M linux-4.4.6-gentoo.tar.zpaq7z a linux-4.4.6-gentoo.tar.7z linux-4.4.6-gentoo.tar 285.35s user 1.85s system 498% cpu 57.619 total88M linux-4.4.6-gentoo.tar.7zlrzip linux-4.4.6-gentoo.tar 304.75s user 3.85s system 566% cpu 54.503 total89M linux-4.4.6-gentoo.tar.lrzlrzip -g linux-4.4.6-gentoo.tar 36.43s user 2.08s system 284% cpu 13.546 total121M linux-4.4.6-gentoo.tar.lrzlrzip -z linux-4.4.6-gentoo.tar 643.04s user 3.35s system 640% cpu 1:40.98 total70M linux-4.4.6-gentoo.tar.lrzcmix/cmix -c linux-4.4.6-gentoo.tar linux-4.4.6-gentoo.tar.cmix 77.65s user 6.39s system 99% cpu 1:24.06 total
I tried cmix, but Linux OOM-killed it. I have 16GB of RAM.
Made on tmpfs, with a 8350 and gentoo (thus everything compiled with -O2 -march=native)
mattmahoney.net/dc/text.html (compression time should be ignored, as the machine isn't fixed)

My go-to are usually lrzip -z for archiving/backup and xz for everything else (even Windows babbies can decompress it, as 7z can).

May 25, 2016 - 15:42

Other urls found in this thread:

en.wikipedia.org/wiki/Rzip
github.com/philipl/pifs
twitter.com/SFWRedditGifs

Jose Parker

zip or gzip for compatibility, xz for everything else. If I ever need really fast compression (to save bandwidth or something) I'll use lzop.

May 25, 2016 - 15:49

Eli Bennett

Well, I suggest you into lrzip -l or -g for sanic cases.

May 25, 2016 - 15:52

Nolan Thompson

Doesn't that just use lzop?

May 25, 2016 - 15:54

Ethan Wood

By the way, anyone else surprised that free software doesn't seem to win at all, for once?

Yes, but with a very fast multithreaded implementation and deduplication.

May 25, 2016 - 15:56

Julian Foster

Debian's lrzip package depends on the same lzo library as the lzop package.

Does it do deduplication when you're looking for extreme speed?

May 25, 2016 - 15:59

Daniel Harris

en.wikipedia.org/wiki/Rzip
Lrzip is just a better rzip (LZMA rzip).

May 25, 2016 - 16:02

Aaron Lewis

...

May 25, 2016 - 18:12

Dylan Cruz

Do any of you use FLIF for your images? I'd like to give it a shot. How does it compare to maximum compressed farbfelds?

May 25, 2016 - 20:29

Christopher Ward

πfs of course, the only 100% density compression for any size.
github.com/philipl/pifs

May 25, 2016 - 23:08

Ryan Powell

I really love concepts like these, its all mathematically sound too, its probably the slowest decompressor of all time though

May 25, 2016 - 23:23

Ryder Brooks

Mathematically sound, but it doesn't actually compress your data.

May 26, 2016 - 00:06

Blake Flores

Is it possible that there's better numbers than π for this?

May 26, 2016 - 00:11

Landon Brooks

Yes.

Guaranteed compression is impossible (I can post a proof if you want), and pi isn't "optimized" to make sequences of digits that are more common in the real world more common in the number. It also doesn't avoid repetition.

But pifs only stores single bytes, so unless you change that, it doesn't matter.

May 26, 2016 - 00:39

Lucas Smith

You have the same result as xz but slower. Why use it then?

May 26, 2016 - 03:21

John Reed

ZPAQ, of course.

May 27, 2016 - 07:13

Nathaniel Rogers

But it's slow and not that good compared to LZMA2.

May 27, 2016 - 09:42

Brody Bennett

For networking I use LZ4. I used to use Snappy but that code is a shitpile. No one at Google knows what they're doing anymore except the Chrome team.

May 27, 2016 - 20:57

Jordan Harris

pigz

May 28, 2016 - 12:56

Charles Torres

when i run out of space i just buy another hard drive. for everything else there's dtrx

June 6, 2016 - 20:33

Brody Miller

It's just an extractor, m8. Do you mean you don't compress anything? Some stuff like ISOs can really benefit from good compression.

June 7, 2016 - 02:36

Jeremiah Turner

compression is for poorfags who can't afford more storage space. prove me wrong

June 7, 2016 - 03:05

Nicholas Gray

Storing files in Pi is simple. Just find a sequence of numbers that represent your file and mark its location. Sure the location data might make it not 100% per se but pretty close.

June 7, 2016 - 03:53

Xavier Gonzalez

This is based on the assumption that pi is a normal number. That has yet to be proven.

June 7, 2016 - 04:10

Adam Sanchez

This is revolutionary.

June 7, 2016 - 08:47

Isaiah Jackson

...

June 7, 2016 - 08:49

Blake Nguyen

ZPAQ for everything, especially backups since it's journaled

-m 4 for general data since it's powerful and ignores already compressed data

-m 5 for best compression if I have lots of time

June 7, 2016 - 09:03

Ryan Johnson

On average, the location data will be as large as what you store, or maybe a bit more. That's easy to prove.

A compression method can be seen as a mathematical function. You put a number in (representing a string of bits) and you get another number out (representing a different string of bits) that can be decompressed to get the original number.

Imagine we have a guaranteed compression algorithm. It doesn't matter what you put in, you always get something smaller back. So if you compress a file of 1000 bits or less, you get a file of 999 bits or less.

There are 2^1001 - 1 files of 1000 bits or less. There are 2^1000 - 1 files of 999 bits or less. So if you compress all files of 1000 bits or less, you must get some duplicates. You can't properly decompress those duplicates, because there are multiple files that they could have been the result of. This means that guaranteed compression is impossible.

Real compression algorithms manage to compress almost everything you put in because most real files have redundancy. For example, most characters in a plain text file are letters or punctuation, and the same words are repeated multiple times. Compression algorithms know how to spot patterns and exploit them. But π doesn't know. π doesn't have any repetition that's useful for this. Random data will do just as badly in πfs as real data.

This is how gzip handles random data:
$ dd if=/dev/urandom bs=128 count=128 of=rand-data 128+0 records in128+0 records out16384 bytes (16 kB, 16 KiB) copied, 0.00219353 s, 7.5 MB/s$ wc -c rand-data 16384 rand-data$ gzip rand-data$ wc -c rand-data.gz 16417 rand-data.gz

June 7, 2016 - 09:30

Andrew Richardson

This just inspired me to make something like this based on AES256 in CTR mode. Since there's lots of hardware acceleration for that out there its probably the fastest way to calculate cryptographically psuedorandom numbers, which all we really need to make use of the Infinite Monkey Theorem.

My idea is to speed things up by making use of as much processing power that can be had on a system, so i'm going to use OpenCL, CUDA, and Beignet to make use of GPU SMPs as well as normal CPU threads.

Each thread will start with a unique(in the context of the other thread) seed value, and then begin comparing the psuedorandom bytes with the uncompressed file. Then once a match is found compression of the seed/counter index will follow until a absolute minimum is found. During the seed/counter compression stage the depth of the compression is also accounted for as well.

June 7, 2016 - 10:03

Ian Wright

poorfag

June 7, 2016 - 10:07

Ethan White

some people don't live in cities or anywhere near a backbone.

also nice devil trips.

June 7, 2016 - 10:09

Jonathan James

Baitu desu ne

June 7, 2016 - 11:14

1 2 ... 4 Next

Compression

Last threads