<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Followup: lrzip</title>
	<atom:link href="http://rusty.ozlabs.org/?feed=rss2&#038;p=81" rel="self" type="application/rss+xml" />
	<link>http://rusty.ozlabs.org/?p=81</link>
	<description>Stealing From Smart People</description>
	<lastBuildDate>Thu, 19 Aug 2010 18:12:38 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Adrian</title>
		<link>http://rusty.ozlabs.org/?p=81&#038;cpage=1#comment-190</link>
		<dc:creator>Adrian</dc:creator>
		<pubDate>Thu, 11 Mar 2010 18:31:50 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=81#comment-190</guid>
		<description>I have just installed and tried lrzip and I am impressed. It gives much better results than all other compressors for large files that are difficult to compress as they already use some form of compression. For example:

Uncompressed tar file with AVI files: 3206 GB
gzip -9: 3179 GB
bzip2 -9: 3178 GB
xz: 3179 GB
xz -9: 3175 GB
7z -mx=9: 3180 GB
lrzip -M: 2829 GB
lrzip -l: 2840 GB
lrzip -n: 2852 GB

The times were over 40 min. for bzip2, xz and 7z, 22 min. for lrzip -M, less than 5 min. for lrzip -l and less than 2 min. for lrzip -n.</description>
		<content:encoded><![CDATA[<p>I have just installed and tried lrzip and I am impressed. It gives much better results than all other compressors for large files that are difficult to compress as they already use some form of compression. For example:</p>
<p>Uncompressed tar file with AVI files: 3206 GB<br />
gzip -9: 3179 GB<br />
bzip2 -9: 3178 GB<br />
xz: 3179 GB<br />
xz -9: 3175 GB<br />
7z -mx=9: 3180 GB<br />
lrzip -M: 2829 GB<br />
lrzip -l: 2840 GB<br />
lrzip -n: 2852 GB</p>
<p>The times were over 40 min. for bzip2, xz and 7z, 22 min. for lrzip -M, less than 5 min. for lrzip -l and less than 2 min. for lrzip -n.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rusty</title>
		<link>http://rusty.ozlabs.org/?p=81&#038;cpage=1#comment-178</link>
		<dc:creator>rusty</dc:creator>
		<pubDate>Sat, 20 Feb 2010 18:04:15 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=81#comment-178</guid>
		<description>Using back-to-back kernel sources isn&#039;t a fair test though.  Perhaps try the 100M bz2 that is the openoffice.org sources?

The issue of 32-bit rzip and 900MB was simply the limitations of machines at the time Tridge was doing his thesis.  64 bit is the obvious thing to do these days, as well as much larger window.  If I get cycles (ha!) I&#039;d like to revisit it entirely: the table of backrefs and literal sizes should be compressed separately from the data itself, and rzip should never emit matches which are within the range of the backend.  Also, LZMA2 should be used (that&#039;s what xz uses I&#039;m told).

For the current kernel sources, xz wins, making it a good choice for the kernel.org mirrors.  But an rzip variant should still be able to do at least as well, and better as we get larger.</description>
		<content:encoded><![CDATA[<p>Using back-to-back kernel sources isn&#8217;t a fair test though.  Perhaps try the 100M bz2 that is the openoffice.org sources?</p>
<p>The issue of 32-bit rzip and 900MB was simply the limitations of machines at the time Tridge was doing his thesis.  64 bit is the obvious thing to do these days, as well as much larger window.  If I get cycles (ha!) I&#8217;d like to revisit it entirely: the table of backrefs and literal sizes should be compressed separately from the data itself, and rzip should never emit matches which are within the range of the backend.  Also, LZMA2 should be used (that&#8217;s what xz uses I&#8217;m told).</p>
<p>For the current kernel sources, xz wins, making it a good choice for the kernel.org mirrors.  But an rzip variant should still be able to do at least as well, and better as we get larger.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Con Kolivas</title>
		<link>http://rusty.ozlabs.org/?p=81&#038;cpage=1#comment-173</link>
		<dc:creator>Con Kolivas</dc:creator>
		<pubDate>Wed, 17 Feb 2010 09:20:01 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=81#comment-173</guid>
		<description>Hi Rusty.

lrzip only becomes useful once the simple long distance dictionary rzip stage is significantly larger than the compression window of the backend (lzma). I doubt you&#039;ll find the rzip first stage helps the kernel source at its current size when lzma&#039;s compression windows are the size they already are. You&#039;ll only start getting a benefit with a much larger codebase... which will be in not so long for linux kernel :P . If you see the benchmarks I posted, lrzip only has an advantage with massive files (and massive ram) when there is long distance redundancy. So if you start including multiple trees instead of a single kernel version, you&#039;ll start seeing a benefit from the rzip stage. By a (not so strange) coincidence I also used kernel trees to test it. http://ck.kolivas.org/apps/lrzip/README.benchmarks

As for bugs in rzip, I found it spat the dummy on greater than 32bit signed in multiple ways and suspect that&#039;s why it had the artificial 900MB limit. Dealing with 32bit, mmap, write() limitations and so on has been a bit of a pain when trying to scale to 64bit sizes I have to admit. That&#039;s been the bulk of the work that went into lrzip. In the process it actually made the compression a little less efficient and slower than rzip due to the 64bit offsets in the archives. xz&#039;s performance on massive files was not so great, being both much slower than 7z (due to not being multithreaded) and not even compressing as well as it. I often wonder if the ground swell for xz on linux is well founded, but then there&#039;s always more to software selection than merit on purely technical grounds. By that I don&#039;t just mean politically...</description>
		<content:encoded><![CDATA[<p>Hi Rusty.</p>
<p>lrzip only becomes useful once the simple long distance dictionary rzip stage is significantly larger than the compression window of the backend (lzma). I doubt you&#8217;ll find the rzip first stage helps the kernel source at its current size when lzma&#8217;s compression windows are the size they already are. You&#8217;ll only start getting a benefit with a much larger codebase&#8230; which will be in not so long for linux kernel :P . If you see the benchmarks I posted, lrzip only has an advantage with massive files (and massive ram) when there is long distance redundancy. So if you start including multiple trees instead of a single kernel version, you&#8217;ll start seeing a benefit from the rzip stage. By a (not so strange) coincidence I also used kernel trees to test it. <a href="http://ck.kolivas.org/apps/lrzip/README.benchmarks" rel="nofollow">http://ck.kolivas.org/apps/lrzip/README.benchmarks</a></p>
<p>As for bugs in rzip, I found it spat the dummy on greater than 32bit signed in multiple ways and suspect that&#8217;s why it had the artificial 900MB limit. Dealing with 32bit, mmap, write() limitations and so on has been a bit of a pain when trying to scale to 64bit sizes I have to admit. That&#8217;s been the bulk of the work that went into lrzip. In the process it actually made the compression a little less efficient and slower than rzip due to the 64bit offsets in the archives. xz&#8217;s performance on massive files was not so great, being both much slower than 7z (due to not being multithreaded) and not even compressing as well as it. I often wonder if the ground swell for xz on linux is well founded, but then there&#8217;s always more to software selection than merit on purely technical grounds. By that I don&#8217;t just mean politically&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rusty</title>
		<link>http://rusty.ozlabs.org/?p=81&#038;cpage=1#comment-172</link>
		<dc:creator>rusty</dc:creator>
		<pubDate>Wed, 17 Feb 2010 00:44:20 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=81#comment-172</guid>
		<description>Interesting!  OK, so I tested that here:
&lt;tt&gt;
$ find linux-2.6 -name .git -prune -o -print &#124; sed &#039;s,\([./][^./]*$\),\1 \1,&#039; &#124; sort -k2 &#124; cut -d\  -f1 &gt; /tmp/sorted-list
$ tar cf /tmp/linux.2.6.sorted.tar -T /tmp/sorted-list --no-recursion
$ ls -l /tmp/linux.2.6.tar /tmp/linux.2.6.sorted.tar 
-rw-r--r-- 1 rusty rusty 395048960 2010-02-17 10:06 /tmp/linux.2.6.sorted.tar
-rw-r--r-- 1 rusty rusty 395048960 2010-02-16 09:03 /tmp/linux.2.6.tar
&lt;/tt&gt;

OK, now the results for each compressor (-9 --to-stdout or -M for lrzip):
&lt;tt&gt;
gzip /tmp/linux.2.6.tar: 86388655
gzip /tmp/linux.2.6.sorted.tar: 84904755
bzip2 /tmp/linux.2.6.tar: 67350827
bzip2 /tmp/linux.2.6.sorted.tar: 66784160
xz /tmp/linux.2.6.tar: 54614828
xz /tmp/linux.2.6.sorted.tar: 55100332
lrzip /tmp/linux.2.6.tar: 56500239
lrzip /tmp/linux.2.6.sorted.tar: 56104936
&lt;/tt&gt;

The xz result is flat-out weird; AFAICT any attempt to help xz with compression makes it worse.  I wonder if it was actually optimized using the linux kernel source?

(PS.  Added comment preview, since it was annoying me).</description>
		<content:encoded><![CDATA[<p>Interesting!  OK, so I tested that here:<br />
<tt><br />
$ find linux-2.6 -name .git -prune -o -print | sed 's,\([./][^./]*$\),\1 \1,' | sort -k2 | cut -d\  -f1 > /tmp/sorted-list<br />
$ tar cf /tmp/linux.2.6.sorted.tar -T /tmp/sorted-list --no-recursion<br />
$ ls -l /tmp/linux.2.6.tar /tmp/linux.2.6.sorted.tar<br />
-rw-r--r-- 1 rusty rusty 395048960 2010-02-17 10:06 /tmp/linux.2.6.sorted.tar<br />
-rw-r--r-- 1 rusty rusty 395048960 2010-02-16 09:03 /tmp/linux.2.6.tar<br />
</tt></p>
<p>OK, now the results for each compressor (-9 &#8211;to-stdout or -M for lrzip):<br />
<tt><br />
gzip /tmp/linux.2.6.tar: 86388655<br />
gzip /tmp/linux.2.6.sorted.tar: 84904755<br />
bzip2 /tmp/linux.2.6.tar: 67350827<br />
bzip2 /tmp/linux.2.6.sorted.tar: 66784160<br />
xz /tmp/linux.2.6.tar: 54614828<br />
xz /tmp/linux.2.6.sorted.tar: 55100332<br />
lrzip /tmp/linux.2.6.tar: 56500239<br />
lrzip /tmp/linux.2.6.sorted.tar: 56104936<br />
</tt></p>
<p>The xz result is flat-out weird; AFAICT any attempt to help xz with compression makes it worse.  I wonder if it was actually optimized using the linux kernel source?</p>
<p>(PS.  Added comment preview, since it was annoying me).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adam Kennedy</title>
		<link>http://rusty.ozlabs.org/?p=81&#038;cpage=1#comment-170</link>
		<dc:creator>Adam Kennedy</dc:creator>
		<pubDate>Tue, 16 Feb 2010 08:41:13 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=81#comment-170</guid>
		<description>Another suggestion for you to experiment with.

Tar the files sorted by file extension instead of file name (if you don&#039;t already).

When we did this for the Strawberry Perl MSI installer, we saw a 10% reduction in file size, on the assumption that files of a similar file extension probably have similar content, and thus grouping them will feed the compression algorithm better.</description>
		<content:encoded><![CDATA[<p>Another suggestion for you to experiment with.</p>
<p>Tar the files sorted by file extension instead of file name (if you don&#8217;t already).</p>
<p>When we did this for the Strawberry Perl MSI installer, we saw a 10% reduction in file size, on the assumption that files of a similar file extension probably have similar content, and thus grouping them will feed the compression algorithm better.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adam</title>
		<link>http://rusty.ozlabs.org/?p=81&#038;cpage=1#comment-167</link>
		<dc:creator>Adam</dc:creator>
		<pubDate>Tue, 16 Feb 2010 04:01:48 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=81#comment-167</guid>
		<description>What about 7z which also uses LZMA. If I remember correctly it offer pretty good compressions ratios. The 7-Zip compressor/decompressor is covered by the GPL and I&#039;m pretty sure it is completely unencumbered by patents.</description>
		<content:encoded><![CDATA[<p>What about 7z which also uses LZMA. If I remember correctly it offer pretty good compressions ratios. The 7-Zip compressor/decompressor is covered by the GPL and I&#8217;m pretty sure it is completely unencumbered by patents.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
