Optimal usage of storage space

How do you use your external hard drives optimally, squeezing every tiny amount of space they contain?

Right now, I have a single 500GB hard drive, split in two partitions. I used to use it for the Wii, but didn't fill more than about 20GB. To keep being able to use it with the Wii, I formatted it as fat32. Since I wanted to store a 4GB+ file on it, I decided to split it into two: a FAT32 and EXT4 partition, equally sized.
Right now, after a few years, with all my warez and anime, I'm starting to get worried I'll run out of space. Today, I started moving some more anime to the drive and it fills very quickly.

What are the best practices for squeezing as much of space as I can get before I have to buy a new one? Should I try compressing each and every file individually? Or as tar archives? (I know I have some bloated files with shitton of zeroes, like 3ds roms. I bet I can save a few GB per ROM) What compression format, and what tools do I use for it? Also, is this partitioning setup really a good idea? I haven't used my Wii in a while, so I guess I can nuke the FAT32 partition and have the whole disk be a single partition. I also know I don't even have to partition the disk, as I can use the whole disk for a single filesystem, saving 1MB or something at the start of the disk. What filesystem is good for optimal long-term storage?

Other urls found in this thread:

github.com/philipl/pifs
twitter.com/SFWRedditGifs

They sell 8gb drives as standard these days.

Most everything you download from warez sites is as compressed as it will ever be, some of them (yiffy) more than they should.

NTFS is the best filesystem, if you aren't a Linux cuck.

I'm sure you're talking terabytes, and no, I'm not looking to buy a ton of drives, I'm looking to squeeze as much as I can out of my current drive.

I'm looking for a filesystem that can squeeze as much data as possible, without needing to be fast. That isn't NTFS nor EXT4, that's for sure.

Opinion ultimately discarded.

Btrfs supports transparent compression.

LVM

Probably why they're trying to write a replacement, right?

ZFS or BTRFS with deduplication can help. In the end, using a good compressor like lrzip on a "normal" FS like ext4 or XFS will bring better results. But with ZFS/BTRFS, you have the advantage of bitrot/corruption protection.
tl;dr compress shit that can be compressed and bundles of < 4K files

The vaporware one they've been failing at since Vista?

Thanks for the informed opinion.
What are the advantages over BTRFS vs XFS? It sounds interesting, as those will probably allow me to read my files faster, instead of having to copy them over to my computer and inflate them there.
Also thanks for the lrzip recommendation. It looks like it compresses much faster than xz (which I was trying to use first, but took too damn long), with comparable rates.
That's the only thing I liked about gzip. I guess I'll have to do find magic then.

I guess I'll start compressing everything with lrzip, and maybe later try to move over to ZFS/BTRFS.

...

if I was trying to get the maximum ammount of storage out of my drives I would maximum compress the files with 7zip AFTER running them through xMedia recode to make videos smaller

NTFS is a disaster. Even Microsoft cucks admit that. They keep trying and failing to replace it like with WinFS but it's very hard to change anything on Windows due to their ecosystem.

btfrs is full of hopes and dreams, you don't get that with xfs. xfs just works, how boring.

So... In terms of features they're pretty much equal or what? How's their tools, any personal experiences you can tell me about?

Most of the features in btrfs are pointless reimplementations of LVM functionality. I've tried to use it before as a way of shipping our embedded firmware (basically a small custom Debian) with fs compression so I could use it in compressed form if necessary (I currently just xz it) but they compress files instead of blocks so it winds up being double the size. Disgusting. I'd recommend sticking with ext4+lvm2.

I believe certain files may increase in size as the result of compression, like binaries. I'm not sure exactly all the types of files behave that way, but just be wary. And don't double compress.


This is a pretty good idea. If you wanted to really squeeze every last bit out of the drive, you can make a batch or bash script to encode your Chinese cartoons to be as small as possible before perceivable quality degradation sets in.


Don't you just love it when their anti-competitive business model comes back to bite them in the ass in the long run?

lrzip -r < lrztar

Same reason nobody uses gzip -r instead of tar | gzip; because gzip doesn't have solid compression.

Store everything in π.

github.com/philipl/pifs

Thanks for the link, it gave me a good idea on how to obtain a propietary encryption key without having to hard-code it in one of my programs or have to have the user provide it.

But, I wanted to add, it's useless for any practical purposes.

You might as well flip all its bits or something. No need to do something that tedious.

What fucking year is it?

gee with your choices of fat32 and NTFS you pretty fucked MScockluver

mkfs.ext4 -t largefile4 -m 0 /dev/wherever.

low on inodes. zilch on root reserves. Saves many, many gigabytes.

...

The year samsung never released the 1tb 950 evo.

It's a very old hard drive. I've never needed much storage space, just recently started storing all my animu after one I wanted to rewatch was nowhere to be found.

Use in-place conversion to make the ext4 partition a btrfs partition, and use zpag -m 5 on all things you need to compress to save space.

zpaq < lrzip
du -h linux-4.4.6-gentoo.tar619M linux-4.4.6-gentoo.tarzip linux.zip linux-4.4.6-gentoo.tar 24.49s user 0.18s system 99% cpu 24.678 total131M linux.zipgzip -k linux.gz linux-4.4.6-gentoo.tar 24.56s user 0.18s system 99% cpu 24.733 total131M linux-4.4.6-gentoo.tar.gzlrzip -g linux-4.4.6-gentoo.tar 36.43s user 2.08s system 284% cpu 13.546 total121M linux-4.4.6-gentoo.tar.lrzzpaq add linux-4.4.6-gentoo.tar.zpaq linux-4.4.6-gentoo.tar -threads 8 -method 2 229.20s user 2.42s system 556% cpu 41.606 total108M linux-4.4.6-gentoo.tar.zpaqbzip2 -k linux-4.4.6-gentoo.tar 56.86s user 0.21s system 99% cpu 57.077 total102M linux-4.4.6-gentoo.tar.bz2lrzip linux-4.4.6-gentoo.tar 304.75s user 3.85s system 566% cpu 54.503 total89M linux-4.4.6-gentoo.tar.lrzxz -k -T 0 linux-4.4.6-gentoo.tar 309.17s user 0.89s system 709% cpu 43.676 total88M linux-4.4.6-gentoo.tar.xzzpaq add linux-4.4.6-gentoo.tar.zpaq linux-4.4.6-gentoo.tar -threads 8 -method 3 336.18s user 2.42s system 551% cpu 1:01.39 total88M linux-4.4.6-gentoo.tar.zpaq7z a linux-4.4.6-gentoo.tar.7z linux-4.4.6-gentoo.tar 285.35s user 1.85s system 498% cpu 57.619 total88M linux-4.4.6-gentoo.tar.7zlrzip -z linux-4.4.6-gentoo.tar 643.04s user 3.35s system 640% cpu 1:40.98 total70M linux-4.4.6-gentoo.tar.lrz

-m 5

Then it's slower and not better than lrzip -z (which uses zpaq).

it uses libzpaq?

total 44628-rw-r--r-- 1 redacted users 4516958 Jul 15 16:04 beyond_the_network.it-rw-r--r-- 1 redacted users 3786137 Jul 15 16:05 beyond_the_network.it.7z-rw-r--r-- 1 redacted users 3696604 Jul 15 16:03 beyond_the_network.it.bz2-rw-r--r-- 1 redacted users 4085277 Jul 15 16:00 beyond_the_network.it.gz-rw-r--r-- 1 redacted users 3386548 Jul 15 16:08 beyond_the_network.it.lrz-rw-r--r-- 1 redacted users 3786460 Jul 15 16:03 beyond_the_network.it.xz-rw-r--r-- 1 redacted users 4085429 Jul 15 16:01 beyond_the_network.it.zip-rw-r--r-- 1 redacted users 4191916 Jul 15 16:05 m1.zpaq-rw-r--r-- 1 redacted users 4188644 Jul 15 16:06 m2.zpaq-rw-r--r-- 1 redacted users 3514957 Jul 15 16:06 m3.zpaq-rw-r--r-- 1 redacted users 3380567 Jul 15 16:06 m4.zpaq-rw-r--r-- 1 redacted users 3054769 Jul 15 16:07 m5.zpaq
You might be wrong.

I was wondering how long it would take someone to link this fundamentally flawed idea.

What's the problem? Sounds like the best "compression" if you don't mind waiting a long time.

In the current implementation, it only stores blocks of one byte, and the data type needed to store position in pi is bigger than one byte.

I see. I didn't read the whole thing. I assumed it worked by making the file into chunks of bytes, then finding the offset of that combination of bytes in pi, then save the collection of offsets to a file. It could be super slow, but good for compression. It also illustrates the stupidity of trying to own numbers, but I know most people are too stupid to realise why you can't own any part of pi.

Theoretically, you could implement it sensibly by making it so it uses blocks of at most 255 bytes and at least 9 bytes, with a 64 bit position number and an 8 bit length counter. Which means at most you could compress a file to 9/255ths the size of the original.

This however, is flawed, as getting that compression ratio requires the data to match to one of the possibilities that the 64 bit number and 8 bit offset create. Assuming that blocks are fixed at 255 bytes, we require 255^255 possibilities to represent the data in a 255 byte segment, which means for most files the 9/255 ratio is impossible.

In reality, you need 255 bytes to represent position, and a last 256th byte to represent an offset, essentially meaning that you will always need as much state as data to do this, meaning that storing things in pi is incompressible.
Also, I made an error, with an 8 bit number, you can use blocks of 256 bytes.

This should be the same for any implementation of pifs.

tl;dr compression using pi is mathematically flawed

Actually, there might be a flaw in my proof, and it might mean that you would need 256 bytes to represent position, assuming that blocks are 256 bytes. So it might be more flawed than I thought.

Small files don't provide good compression benchmarks test (I've had bzip2 dominate xz/7z on a tar made from code and pgm files).

Anyway, nothing impeaches you from using lrzip -n with your favourite compressor.

That's why I said "might be".

Generally, you don't worry about space when your files are 4MB. Either you have lots of 4MB files, or you have big files (or you have a 4MB disk).