Pettering does it again!

Question

Pettering does it again!

Jaxon Campbell

0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html

Introducing casync

In the past months I have been working on a new project: casync. casync takes inspiration from the popular rsync file synchronization tool as well as the probably even more popular git revision control system. It combines the idea of the rsync algorithm with the idea of git-style content-addressable file systems, and creates a new system for efficiently storing and delivering file system images, optimized for high-frequency update cycles over the Internet. Its current focus is on delivering IoT, container, VM, application, portable service or OS images, but I hope to extend it later in a generic fashion to become useful for backups and home directory synchronization as well (but more about that later).

The basic technological building blocks casync is built from are neither new nor particularly innovative (at least not anymore), however the way casync combines them is different from existing tools, and that's what makes it useful for a variety of usecases that other tools can't cover that well.

Why?

I created casync after studying how today's popular tools store and deliver file system images. To very incomprehensively and briefly name a few: Docker has a layered tarball approach, OSTree serves the individual files directly via HTTP and maintains packed deltas to speed up updates, while other systems operate on the block layer and place raw squashfs images (or other archival file systems, such as IS09660) for download on HTTP shares (in the better cases combined with zsync data).

Neither of these approaches appeared fully convincing to me when used in high-frequency update cycle systems. In such systems, it is important to optimize towards a couple of goals:

Most importantly, make updates cheap traffic-wise (for this most tools use image deltas of some form)
Put boundaries on disk space usage on servers (keeping deltas between all version combinations clients might want to run updates between, would suggest keeping an exponentially growing amount of deltas on servers)
Put boundaries on disk space usage on clients
Be friendly to Content Delivery Networks (CDNs), i.e. serve neither too many small nor too many overly large files, and only require the most basic form of HTTP. Provide the repository administrator with high-level knobs to tune the average file size delivered.
Simplicity to use for users, repository administrators and developers
I don't think any of the tools mentioned above are really good on more than a small subset of these points.

Specifically: Docker's layered tarball approach dumps the "delta" question onto the feet of the image creators: the best way to make your image downloads minimal is basing your work on an existing image clients might already have, and inherit its resources, maintaing full history. Here, revision control (a tool for the developer) is intermingled with update management (a concept for optimizing production delivery). As container histories grow individual deltas are likely to stay small, but on the other hand a brand-new deployment usually requires downloading the full history onto the deployment system, even though there's no use for it there, and likely requires substantially more disk space and download sizes.

OSTree's serving of individual files is unfriendly to CDNs (as many small files in file trees cause an explosion of HTTP GET requests). To counter that OSTree supports placing pre-calculated delta images between selected revisions on the delivery servers, which means a certain amount of revision management, that leaks into the clients.

Delivering direct squashfs (or other file system) images is almost beautifully simple, but of course means every update requires a full download of the newest image, which is both bad for disk usage and generated traffic. Enhancing it with zsync makes this a much better option, as it can reduce generated traffic substantially at very little cost of history/metadata (no explicit deltas between a large number of versions need to be prepared server side). On the other hand server requirements in disk space and functionality (HTTP Range requests) are minus points for the usecase I am interested in.

(Note: all the mentioned systems have great properties, and it's not my intention to badmouth them. They only point I am trying to make is that for the use case I care about — file system image delivery with high high frequency update-cycles — each system comes with certain drawbacks.)

DISCUSS

June 20, 2017 - 12:19

Other urls found in this thread:

en.wikipedia.org/wiki/Casync
twitter.com/NSFWRedditImage

Dominic Hill

So how many millions of lines of code does it have so far?

June 20, 2017 - 12:45

Brayden Long

gadammit

June 20, 2017 - 12:53

Liam Cooper

Poettering, no. Poettering, please, no. Poettering, for pete's sake, NO
I can't wait to see images bricking computers because the EFI partition of another computer is flashed over their EFI partitions.

June 20, 2017 - 13:10

Carter Smith

Unix garbage can't be fixed.

June 20, 2017 - 13:23

Oliver Gutierrez

A new filesystem by the founder of open source, Poettering? I say we test it in production by making it a hard dependency of every single package in all main distros.

June 20, 2017 - 13:39

Noah Johnson

Wait, pottering is German?
They do have a knack for producing both crazy and insane people and geniuses.
You know what category he is in.

June 20, 2017 - 13:45

Carter Roberts

Looks pretty good, similar to borg backup in implementation.

June 20, 2017 - 21:18

Juan Howard

Images? Is that like smalltalk images?

June 20, 2017 - 21:45

Grayson Sanchez

wtf is he killing squashFS next????????????????????????????????
what the fuck is wrong with this guy

June 20, 2017 - 23:57

Camden Flores

Nice rusing - I'm not sure what category he's in, but I don't mind systemd, so maybe I'm an outlier.

June 21, 2017 - 02:27

Logan Ramirez

NSA has full access to all services on computer (systemd)
NSA has full access to EFI (systemd)
NSA has full access to network stack (systemd)
NSA has full access to storage devices (casync)
NSA has full access to sound system (pulseaudio)

Just another thing to actively avoid.

June 21, 2017 - 04:20

Jackson Richardson

(checked)
/thread

June 21, 2017 - 05:26

Charles Clark

Borg can prune. He didn't mention pruning so I'm not sure how viable it is as a backup solution.

June 21, 2017 - 05:26

Christian Taylor

At this point you are better off installing Windows 10 and capping it.

June 21, 2017 - 05:32

Lucas Ramirez

witnessed

polite sage

June 21, 2017 - 05:43

Jeremiah Ortiz

This tool supports dm-crypt filesystems
Try not shitposting so hard next time

June 21, 2017 - 06:31

Isaac Thomas

LENNART

June 21, 2017 - 08:27

Camden Reed

So how long until RedHat starts raping Linux again so that 99% of systems have to install this to work around a manufactured problem? I give it 3 weeks.

June 21, 2017 - 09:39

Alexander Edwards

NSA has full access to every single thing happening on your computer (linux)
Seriously, why do people think the NSA compromised systemd but not Linux?

June 21, 2017 - 12:49

Samuel Baker

Just wait until M$ adopts systemd.

It will be called iSystemWoW32d

June 21, 2017 - 17:50

Kevin Clark

dm-crypt is shit
the code is hosted on google lel

June 21, 2017 - 18:14

Matthew Sanders

Not going to trust my data with Poetterware. And judging from history, it's only a matter of time before it depends on systemd.

Of course he's trying to Windowsify freenix, who'd think anything else.

Criminally underrated gif! Generalizes over many House episodes.

June 21, 2017 - 19:24

Leo Green

Why would MS adopt systemd? They designed svchost right the first time.

June 21, 2017 - 19:36

Sebastian Robinson

look how fast it got a wikipedia article

en.wikipedia.org/wiki/Casync

June 22, 2017 - 01:14

Josiah Price

Not to mention GPT and SSD which also got popular the same time as all these crap

June 22, 2017 - 03:58

1 2 3 Next

Pettering does it again!

Last threads