xz vs rzip

As the kernel archive debates replacing .bz2 files with .xz, I took a brief glance at xz. My test was to take a tarball of the linux kernel source (made from a recent git tree, but excluding the .git directory):

     linux.2.6.tar 395M

For a comparison, bzip2 -9, rzip -9 (which uses bzip2 after finding distant matches), and xz:

     linux.2.6.tar.bz2 67M
     linux.2.6.tar.rz 65M
     linux.2.6.tar.xz 55M

So, I hacked rzip with a -R option to output non-bzip'd blocks:

     linux.2.6.tar.rawrz 269M

Xz on this file simulates what would happen if rzip used xz instead of libbz2:

     linux.2.5.tar.rawrz.xz 57M

Hmm, it makes xz worse! OK, what if we rev up the conservative rzip to use 1G of memory rather than 128M max? And the xz that?

     linux.2.6.tar.rawrz 220M
     linux.2.6.tar.rawrz.xz 58M

It actually gets worse as rzip does more work, implying xz is finding quite long-distance matches (bzip2 won't find matches over more than 900k). So, rzip could only have benefit over xz on really huge files: but note that current rzip is limited on filesize to 4G so it's a pretty small useful window.