Data Compression

What are the best ways to compress/decompress data on consumer hardware? For example, if a large amount of video (say, 10TB of pirated movies) should be compressed, what tools would you use and how well/quickly would they perform?

When is lossy compression preferred, if ever?

Other urls found in this thread:

youtube.com/watch?v=icruGcSsPp0
git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c).
cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/true/true.c?rev=1.1&content-type=text/x-cvsweb-markup
youtube.com/watch?v=lX_pF03vCSU
man.openbsd.org/tar
netbsd.gw.com/cgi-bin/man-cgi?tar NetBSD-current
leaf.dragonflybsd.org/cgi/web-man?command=tar§ion=ANY
freebsd.org/cgi/man.cgi?query=tar&apropos=0&sektion=0&manpath=FreeBSD 11.0-RELEASE and Ports&arch=default&format=html
pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html
twitter.com/AnonBabble

Smaller clips like youtube/vine. Watching movies that have been codec'd to blurry screen salsa is suffering. Just use any common compression software.

why do people still use tar for compression when tar suggs

If you're just looking to compress a bunch of videos into a zip/rar folder you won't see much benefit in terms of file size no matter what compression method you use. At least in my experience that is true.

If you want to save space.

tar doesnt do compression

subtle

tar + gzip or bzip2 whatev. i consider it all to be tar. regardless they still suck

I recommend tar+xz on Linux and other unixoids, and 7zip on Windows. Both use the same algorithm. I think lrzip has slightly better but much slower compression, so that might be preferable for long-term archiving.

But they're no good for compressing movies. If you need to make those smaller you have to re-encode them, at the loss of quality and a lot of time.

Almost everyone here is retarded. I'll rate the retardation.


10/10, movies are already compressed (see CABAC).

4/10, lossy doesn't mean non transparent.

10/10, you've already been told: tar stands for Tape Archiver, not compressor.

Still 10/10, you're coherent in your stupidity, man. Bzip2 is alreay better than gzip/zip (DEFLATE) and just a little bit worse than xz/7z (lzma) while being pretty fast.
Use xz or lrzip if you want maximum compression.

Just implement your own you'll learn a lot in the process.

Zpaq for long term, Bzip2 for short term

lrzip defaults to lzma yes, but also has zpaq if you absolutely have to win a compression pissing contest.

this is horse shit
because it's not better for all cases, I've seen a lot of counterexamples in real life (not some specially crafted stuff to thwart compression but real files)

also, it's slow, not really much faster than LZMA, so there is zero point in using BZIP2 over LZMA

Most software is shit.
You should use at least x264, in 10-bit mode, encode using CRF mode, with correctly set -preset and -tune parameters.
also, x265 is marginally better but orders of magnitude slower.

Also, this is only reasonable if your sources are maximum quality uber HD.
Otherwise the quality will go to shit, because of huge generation loss.

Example, please.

It's actually not for the transparent scenario (i.e. crf 17-19 in x264). Because of retarded intrasmoothing, SAO and strong deblocking. x265 is made for Jewflix and corporations like these, which do not target transparency at all.
t. doom9 user

i don't remember, maybe some DXT images or some other crap.
won't go testing it all again just because you asked.

yeah maybe this is true.
then go for x264 10-bit

this image is shit. they used compression settings that use 10GB of ram and more. And test data was some stupid human genome


fuck transparency
at lower bitrates, x265 destroys x264

1. Who are you quoting?
2. Read up on the UNIX philosophy and educate yourself.

unix philosophy is make things anti-user-friendly, over-complicated, without integrity

fuck unix, fuck stallman and torvalds
fuck linux users

The UNIX philosophy is not that great, and it's a good thing GNU and Linux don't follow it very closely.

In this particular case it means that extracting a single specific file from a compressed tar archive requires decompressing the entire archive up to that point, while extracting a single file from a zip archive is efficient. .zip's way has disadvantages as well, but it shows that good reasons to drop modularity exist.

GNU takes things to the opposite extreme. And it's too dangerous now to play that game. Code has to be kept small and distinct so it can be audited better.

Yes, it's crap philosophy. And linux having 0% market share while being free clearly shows that.

Shit, that's fucking stupid.
what's the fucking point of separating tar and gz (archiving and compressing) when in 99.9% of cases people use both anyway? and if you don't want compression you can set in zip/rar/7zip that you don't want it? also rar/7zip allows you to encrypt your archive

linux is garbage and windows destroys it yet again

But you don't always want both. It's stupid to gzip a tarball of already compressed files (including images and mp3's).

That's when you would disable compression for that particular zip file, like he said.

Then just run tar without the z or j flags, and it won't invoke gzip or bzip2.

I didn't say that it was a tar flaw, I said that it wasn't a zip flaw.

It is a flaw, becuase you made your tool more complicated than it needs to be. This is the same exact thinking that's brought about the shitty world of modern software, web, and exploits galore.
You should watch some Terry Davis videos and repent.

It barely makes the format more complicated. It's not ideal, but it's an option that only removes other complexity, so I think it's acceptable. Merely removing the option for no compression wouldn't make the format any less complex, because it's just one of a larger set of compression methods supported by the format.

I am listed in the TempleOS credits.

Yea? I'm the one who told him to build it.

For archiving and intermediate operations while editing something, always use lossless. Reenconding something over and over again with a lossy algorithm will fuck it up as shown here: youtube.com/watch?v=icruGcSsPp0

Use lossless for your final publishing but always pay attention about how much quality you lose and try to find the best balance. Whenever I make WebMs I think long and hard about whether HD is necessary and encode them at least twice with different bitrates. For JPEGs I usually look for the smallest quality below which artifacts are clearly visible and take something a little above that.


Holy shit! so much this!
The solution to all the botnet problems are minimalist, auditable, formally proven programs. Not FSF-compliant licenses with empty programs whose source code exceeds 80 lines of code (git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c).

OpenBSD version:
cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/true/true.c?rev=1.1&content-type=text/x-cvsweb-markup

lmao

This is hard stuff to get right.

"Look at my smart-hammer-screwdriver-jumping_rope, obviously better than your normal tools".
I guess you think Swiss knives are also better than regular knives.

Epic bait

Please fucking read it before you comment on it, your ignorance is showing.
You see, sometimes you don't want to compress. Or sometimes you want to use different compression programs. Or sometimes you just want to add new files to the archive and don't want to extract (i.e. bzip2 has this). Duplicating archival code for 9000 compression programs is pointless. The point of the UNIX philosophy is that every program does one thing and does it well. This allows people to maintain their own program and make sure it does its job best without worrying about compability with other parts of the program. This also allows polymorphism. gz doesn't care where the input comes from as long as it's data. You can compress a single file, a tar archive, /dev/sda and a TCP connection.
pic related

what about fuck you instead?
I'd prefer good quality, because I can afford it, because I'm not a nigger

nobody said you must use tar for this scenario.
obviously each tool/format has its specific use case. (would you want to strike nails with microscope?)

First, I can't stress this enough, don't listen to anyone talking about zip, gz, xz, or any other archival format when compressing video like in your example. These formats are primarily for text compression and that's where they perform best. Zipping video files is a complete and utter waste of time; it takes ages and will give you basically no returns in terms of size. Any codec worth its salt will have reduced the opportunities of general compression down to basically zero.

If you've pirated movies, they're probably already compressed, and video compression is almost always lossy. H264, MPEG-4, VP8, whatever, if you haven't encoded it in Huffyuv or something, it is lossy. Lossy =/= bad quality necessarily, but you probably won't gain anything by re-encoding from one lossy format to another. It will just look worse since different codecs have different strategies for compressing.

That said, in your video files there will be audio streams which you might be able to compress. One thing you might be able to do is reduce the number of audio channels if the encoder hasn't already done so and the source had surround sound.

To do this, you would definitely use ffmpeg. As always, RTFM but have a hint: the -ac option sets the number of audio channels.

An example of when it returns EXIT_FAILURE is when you ask it for version information written to a bad file descriptor. Try "/bin/true --version >&-".

I think that's reasonable.

but my true implementation in asm isnt as bloated as gnu. checkmate faggot.

No, and System V Release 4 is not as bloated as Linux and the BSDs. Do you use it?

You're the retard. Xz's compression rate at standard settings is better than bzip2 by a similar margin that bzip2 beats gzip -9. It's hardly "a little bit". It also decompresses much faster than bzip2, it's only compression that is slow.
Also, at very low compression settings (-1 or -2) xz STILL (slightly) beats bzip2's rate while using less memory and compressing significantly faster.
Use xz always. Change preset depending on whether you need speed or max compression.

Kek, the most GNU/freetard answer.


You, sir, are a faggot.

Keeping tar and compressors separate allowed us to reuse the mature, time-tested tar archiver for the last three decades, while replacing the compressors as technology improves with minimum effort, disruption and compatibility problems. How many archive formats you gatesbots went through during that time? Do you still sometimes feel forced to use the ancient and utterly obsolete .zip format for compatibility reasons?


Nobody is stopping you from using .xz.tar instead of .tar.xz. People usually opt for the latter because of better compression rates.

b-but muh disk space! negros can't supposedly be expected to have enough money for a couple of TB of storage space, what's about their grape dank and watermelon money?

For the people freaking about Tar...

Tar was invented for a very specific purpose: create a 'fake filesystem' that is suitable for tape backup machines.

When you are compressing, you can achieve bigger compression by compressing a single file, instead of multiple files, this is why people 'tar' stuff first, they turn lots of files into a single file, and then they compress it.

.rar for example had a separate setting to simulate that (WinRAR call it "Solid Archive")

Example: file A and B are identical, and our compressor will compress them to 50% of their size.

If we compress A and B separately, we will reduce total size by 50%.

If we tar them first (or use .rar "Solid Archive"), the compressor will notice both files are completely identical, and store only one copy, compressed the most it can, so the total file compressed is now 25% (instead of 50%).

This is useful for example if you are compressing a huge amount of html files (since they probably will have similar headers).

Your dumb-hammer-screwdriverless-non_rope doesn't worth shit without a wall, a nail and a hammer operator.


I want to safely store my large and compressible files in a long turn. I need one program for archiving, one program for compressing, one program for encrypting, one program for creating a recovery record and one program for splitting the archive. Oh, and probably one more program to add some notes or metadata.


From the user perspective:


From the user perspective:
A design like this should not exist at all.


Thank you very much for explaining the purpose of tar and the "Solid Archive", I didn't know, I never saw any use of it. But it is reassuring that it has my back if I want to store multiples of the same files for some reason.
I just want to add one note to this: tar was designed at a time, when people did version control by copy-pasting project folders and adding one to the folder name. Today it should be obsolete. DAR (Disk ARchive) allows incremental backups, and WinRAR5 has the option to 'Store symbolic links as links" and "Store hard links as links". These are the features which should be used for de-duplication, not Solid Archives.

Backward compatibility is ok, but it is should not be the reason to keep using failed designs. It is very disappointing to see FreeArc is a one-man project because everyone else is crying about 'muh tar', and it seems no one ever heard of DAR outside of forensics communities.
And that is why I still use WinRAR5. youtube.com/watch?v=lX_pF03vCSU

no user it's hilarious
pure perfection

What's nice is we have them at one finger's length.
That's where sh comes in handy.
GNU tar can do this.

Actually, no. Every tar can do this. See
man.openbsd.org/tar
netbsd.gw.com/cgi-bin/man-cgi?tar NetBSD-current
leaf.dragonflybsd.org/cgi/web-man?command=tar§ion=ANY
freebsd.org/cgi/man.cgi?query=tar&apropos=0&sektion=0&manpath=FreeBSD 11.0-RELEASE and Ports&arch=default&format=html

Also, there's a POSIX succesor for tar called pax that can do this too. Don't know why it wasn't really accepted:
pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html

Because by that time GNU tar was a near-standard on all unices, and its extended format already supports the pax feature set (and then some). Also, GNU tar supports all common tar variants, including pax.

pax is standard on *BSD, and I even found it on various Redhat Linux systems.

I meant accepted by the everyday user. Myself have almost not heard of it, while tar and to some extent cpio are well known.
May switch to it one day, the syntax seems really better.

From the user perspective:
>_______
I let you fill that line out.

You 2 just got baited (sorry). You are the perfect examples of what I was talking about. After being presented with a much better alternative, you still take an effort to defend that file format or a software function that helps keeping that file format around.

freenode is for niggers

fuck you faggot POSIX a best

And that is a good thing, because it prevents feature creep and allows developers to focus on one thing at a time.
How did you pull that one out of your ass? You can extract specific files with tar, even with compression.

gas your life

This guy said it:
This guy explained it:
And my experience with dealing with large archives confirmed it.

Maybe I can. But you can't, because you just used a different program to split the archive up. :)