<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Not Always Lovely Blooms&#8230;</title>
	<atom:link href="http://rusty.ozlabs.org/?feed=rss2&#038;p=20" rel="self" type="application/rss+xml" />
	<link>http://rusty.ozlabs.org/?p=20</link>
	<description>Stealing From Smart People</description>
	<lastBuildDate>Thu, 19 Aug 2010 18:12:38 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: rusty</title>
		<link>http://rusty.ozlabs.org/?p=20&#038;cpage=1#comment-7</link>
		<dc:creator>rusty</dc:creator>
		<pubDate>Thu, 29 Oct 2009 10:47:34 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=20#comment-7</guid>
		<description>Tridge says we&#039;re down a factor of 100x at 100,000 strings.  That means we need log(n) to grow by a factor of 100, which is for all intents infinite.  

Note that for unaligned strings and cache line size 128, qsort will almost always take a single cacheline per string, bloom filter 2 cachelines about 100/128 times.  That&#039;s probably the real killer here (why he saw it get worse as the number of strings gets bigger).

Also, the Bloom filter size grows linearly with the number of elements (for constant false-positive probability).  If you can&#039;t sort in place, the sort solution grows by one pointer per element too.  If you can, it&#039;s a lose for Bloom.

Hope that helps!</description>
		<content:encoded><![CDATA[<p>Tridge says we&#8217;re down a factor of 100x at 100,000 strings.  That means we need log(n) to grow by a factor of 100, which is for all intents infinite.  </p>
<p>Note that for unaligned strings and cache line size 128, qsort will almost always take a single cacheline per string, bloom filter 2 cachelines about 100/128 times.  That&#8217;s probably the real killer here (why he saw it get worse as the number of strings gets bigger).</p>
<p>Also, the Bloom filter size grows linearly with the number of elements (for constant false-positive probability).  If you can&#8217;t sort in place, the sort solution grows by one pointer per element too.  If you can, it&#8217;s a lose for Bloom.</p>
<p>Hope that helps!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://rusty.ozlabs.org/?p=20&#038;cpage=1#comment-6</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Thu, 29 Oct 2009 09:50:07 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=20#comment-6</guid>
		<description>Presumably if you had a *lot* more strings (enough that log(n) becomes bigger than the average string length), bloom filters might start to prove faster.</description>
		<content:encoded><![CDATA[<p>Presumably if you had a *lot* more strings (enough that log(n) becomes bigger than the average string length), bloom filters might start to prove faster.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert Collins</title>
		<link>http://rusty.ozlabs.org/?p=20&#038;cpage=1#comment-5</link>
		<dc:creator>Robert Collins</dc:creator>
		<pubDate>Tue, 27 Oct 2009 11:20:05 +0000</pubDate>
		<guid isPermaLink="false">http://rusty.ozlabs.org/?p=20#comment-5</guid>
		<description>We&#039;ve found that bloom filters are really marginal much of the time - its a very clever idea, but you have to have rather precise circumstances for them to be really useful.</description>
		<content:encoded><![CDATA[<p>We&#8217;ve found that bloom filters are really marginal much of the time &#8211; its a very clever idea, but you have to have rather precise circumstances for them to be really useful.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
