As the kernel archive debates replacing .bz2 files with .xz, I took a brief glance at xz. My test was to take a tarball of the linux kernel source (made from a recent git tree, but excluding the .git directory):
For a comparison, bzip2 -9, rzip -9 (which uses bzip2 after finding distant matches), and xz:
linux.2.6.tar.bz2 67M linux.2.6.tar.rz 65M linux.2.6.tar.xz 55M
So, I hacked rzip with a -R option to output non-bzip’d blocks:
Xz on this file simulates what would happen if rzip used xz instead of libbz2:
Hmm, it makes xz worse! OK, what if we rev up the conservative rzip to use 1G of memory rather than 128M max? And the xz that?
linux.2.6.tar.rawrz 220M linux.2.6.tar.rawrz.xz 58M
It actually gets worse as rzip does more work, implying xz is finding quite long-distance matches (bzip2 won’t find matches over more than 900k). So, rzip could only have benefit over xz on really huge files: but note that current rzip is limited on filesize to 4G so it’s a pretty small useful window.