So I did some IBLT research (as posted to bitcoin-dev ) and I lazily used SHA256 to create both the temporary 48-bit txids, and from them to create a 16-bit index offset.  Each node has to produce these for every bitcoin transaction ID it knows about (ie. its entire mempool), which is normally less than 10,000 transactions, but we'd better plan for 1M given the coming blopockalypse.

For txid48, we hash an 8 byte seed with the 32-byte txid; I ignored the 8 byte seed for the moment, and measured various implementations of SHA256 hashing 32 bytes on on my Intel Core i3-5010U CPU @ 2.10GHz laptop (though note we'd be hashing 8 extra bytes for IBLT): (implementation in CCAN)

  1. Bitcoin's SHA256: 527.7+/-0.9 nsec
  2. Optimizing the block ending on bitcoin's SHA256: 500.4+/-0.66 nsec
  3. Intel's asm rorx: 314.1+/-0.3 nsec
  4. Intel's asm SSE4 337.5+/-0.5 nsec
  5. Intel's asm RORx-x8ms 458.6+/-2.2 nsec
  6. Intel's asm AVX 336.1+/-0.3 nsec

So, if you have 1M transactions in your mempool, expect it to take about 0.62 seconds of hashing to calculate the IBLT.  This is too slow (though it's fairly trivially parallelizable).  However, we just need a universal hash, not a cryptographic one, so I benchmarked murmur3_x64_128:

  1. Murmur3-128: 23 nsec

That's more like 0.046 seconds of hashing, which seems like enough of a win to add a new hash to the mix.