Rusty Russell's Coding Blog | Stealing From Smart People

At linux.conf.au I gave a last-minute talk previewing my work on pettycoin (video, slides), an experiment to shard a bitcoin-like network.  The idea is to trade off some security and robustness in return for scale, but use it only for small amounts where fraud is less worthwhile.  On the bitcoin network today this is already seen with zero-confirmation transactions, and this is the niche pettycoin seeks to fill.

There are numerous problems to be overcome (one being the time taken by my day job, of course).  But segmenting the network and the blockchain is an interesting challenge: bitcoin’s blockchain is already designed so that you can have partial knowledge (mainly so you can prune used outputs).  But there’s a clear divide between full nodes, and second-class partial nodes.  I want a system where no one need know everything, and I’m getting closer to that goal.

Consider the simplest useful transaction in the bitcoin network, with one input (ie. a previous transaction’s output) and one output.  To verify this is a fairly simple process:

  1. Is the transaction well-formed?
  2. Find the transaction whose output this is spending.
  3. Does the signature match the address of that output?
  4. Has that output already been spent?

With bitcoin, you’re expected to know every transaction with unspent outputs, so if you can’t find the transaction at step 2, the verification fails. Even better, you can verify that previous transaction, too, all the way back to the creating of the coins involved.  Your only worry is that the blockchain you have is the same as everyone else’s, so they’ll accept your transaction later.

If you don’t expect to know everything, it’s more difficult.  You can use a merkle proof to show that a transaction was present in a block; it takes just log(N) hashes for an N-transaction block.  So you could prove that all those previous transactions are in the blockchain (though that might be thousands of transactions) by providing me with each transaction and proof.

But this can’t prove that there are not duplicate transactions in the blockchain itself.  Only knowing the entire contents would do that.  So we’re relying on the rest of the network, each with a partial view, to check that didn’t happen.

This leads to the two requirements for any aspect of the pettycoin system which a node can’t verify by itself:

  1. The information to verify must be seen by some honest nodes.
  2. Each node must have an efficient way of reporting if it sees a problem.

The former is a bit tricky.  Consensus is formed by the blockchain, but you don’t want to see all of it.  You might expect to see some fraction of it, but if you don’t, how would you alert the network in a way that can’t be faked?   Imagine a miner holds back 5 transactions in the block, the miner might wait for your complaint message on one, then release that transaction making you look like the dishonest one.  By making you cry wolf, they can ensure you are ignored.

The solution used in pettycoin is that miners have to prove that they know the transactions in the 10 previous blocks.  They do this by hashing the transactions from the previous block into a merkle tree like normal, only they prefix each transaction with their payout address (this is called prev_merkle in the code).  The only way to generate this hash is to know the contents of each transaction, and you can’t make a valid block without it.  Unfortunately, the only way to demonstrate that this hash is wrong (thus the block is invalid) is to also know the contents of each transaction in the block.  Thus transactions are batched into groups of 4096; you only need send 4096 transactions to prove that one of the hashes in a block is wrong.  Miners will insist on knowing the transactions for those blocks, knowing that if they fake it they’ll likely be caught.

Reporting most other problems in a block is fairly:

  1. You can prove a duplicate spend in the block chain by showing both transactions and the merkle proofs that they are in each block.  The second block is invalid.
  2. You can prove a malformed transaction by showing the transactions and the merkle proof it is in the block.  That block is invalid.
  3. You can prove an overspend by showing the transactions used as inputs.  That block is invalid.

But if a transaction in a block relies on an output of a transaction which never existed, you can’t prove it.  Even if you know every transaction which ever happened, you can’t prove that to me (without sending me the whole blockchain).  The initial design lived with such warts in the blockchain, instead insisting that you would have to show all the predecessors when you paid me (via a payment protocol).  That predecessor tree quickly becomes unwieldy, however.

The new approach is that for each input of a transaction in the blockchain, the miner has to include the block and transaction number where it appeared.  Now anyone who knows that previous transaction can check it, and if there’s a problem it’s easy for any node to prove by showing the transaction which is in that previous block (with merkle proof that it is).

This means that the blockchain can be trusted, if half the mining power can be trusted.  This is a weaker guarantee that bitcoin, but sufficiently strong for pettycoin.  If you send me a new transaction along with transactions it uses as inputs  and merkle proofs that they are in the blockchain, I only need ensure that the new transaction isn’t a double-spend.  That’s the same as the bitcoin network, with zero-confirmation transactions (though pettycoin has a special double-spend report message to expedite it a little).

Next post, I’ll respond to the best criticism of pettycoin yet, the problem of gateways (by Jason Gi)…

No tags Hide

Delightedly finished reading through the linux.conf.au program.  Some nasty clashes have me still arguing with myself, but here are my personal compulsory attendance talks.  Your preferences will no-doubt differ, so I’ve tried to explain my reasons:

 

  • Tridgell: Open Hardware Differential GPS – I spoke to Tridge about this, and the abstract completely undersells it.
  • Corbet: Kernel – Jon’s kernel talks are great for non-kernel people, but for me it’s about seeing the forest through the trees.
  • McKenney: Parallel Verification – Paul’s spoken with me about this, but I want to hear the practical side to see how I can apply it.
  • Heo: Kernel per-cpu  – I persuaded Tejun to submit this; his per-cpu work was elegant (mine, on which this was built, was merely functional).
  • Airlie: Virtual GPU – Sorry Bdale (with whom it clashes): I have wanted a virtio GPU for so long, I need to see it.
  • Packard: Zero-copy Compositing  – Keith is always good, and graphics performance is fascinating.
  • Suehle: Raspberry Pi – Generally O’Reilly books are well researched, so I expect great content here.
  • Isaacs: CTDB Bugs – I was around when he was finding some of these, and there are some fascinating surprises here.

No tags Hide

Jul/13

27

Git prompt for bash

I don’t know who wrote this originally, but this is from my .bashrc.  Tridge’s is simpler, but has colour!

Before this, I avoided git branches in favour of multiple copies of repositories because I use my prompt to provide location.  This provided the missing piece…

# Git me harder!
__git_ps1 ()
{
    local g="$(git rev-parse --git-dir 2>/dev/null)"
    if [ -n "$g" ]; then
        local r
        local b
        if [ -d "$g/../.dotest" ]
        then
            local b="$(git symbolic-ref HEAD 2>/dev/null)"
            r="|REBASING"
        elif [ -d "$g/.dotest-merge" ]
        then
            r="|REBASING"
            b="$(cat $g/.dotest-merge/head-name)"
        elif [ -f "$g/MERGE_HEAD" ]
        then
            r="|MERGING"
            b="$(git symbolic-ref HEAD 2>/dev/null)"
        else
            if [ -f $g/BISECT_LOG ]
            then
                r="|BISECTING"
            fi
            if ! b="$(git symbolic-ref HEAD 2>/dev/null)"
            then
                b="$(cut -c1-7 $g/HEAD)..."
            fi
        fi
        if [ -n "$1" ]; then
            printf "$1" "${b##refs/heads/}$r"
        else
            printf " (%s)" "${b##refs/heads/}$r"
        fi
    fi
}

PS1="${PS1//\\w/\\w\$(__git_ps1)}"

No tags Hide

As raised recently by Sarah Sharp, the Linux Kernel mailing list (lkml) has a reputation as an intimidating place.  The context (covered so well by LWN) was that Greg Kroah-Hartman, the stable maintainer, is seen as a soft touch who accepts patches Linus wouldn’t.

There’s been much uninformed discussion from those outside lkml, so let’s start with a common basis:

  1. Sarah Sharp is an established and respected kernel maintainer.  She’s made it.
  2. Linus (and other developers) are human, and sometimes write in anger.

Now my opinions, as someone who cares about this issue and has been working on the kernel for about 16 years.

The kernel mailing list is much friendlier than it used to be: some of its reputation is now undeserved.  Linus is unreserved in criticising code or actions, but rarely crosses into ad-hominem.  His absolutist statements reduce RTT by telling you what is required; geeks love to argue, but it’s pointless because it’s his git tree.

That said, imitating Linus on lkml causes problems; without his authority, loudly claiming absolutes is simply ranting.  This escalates until it’s remarkably hard to avoid crossing into personal attacks; most of us inevitably double-down when we’re criticized, and train-wreck ensues.

I plan to follow Sarah’s example and respond when someone’s abusive.  Making it clear what’s expected should make things more pleasant eventually.  It’s been about ten years since I decided to reduce my flames to a single post every year; I’m now going to aim for zero (aka. “What Would Sarah Sharp Do?”)

No tags Hide

I’ve been collecting these as I research the bitcoin protocol, so I thought it was worth posting about.  None of these are groundbreaking, but these are what surprised me as I deepened my understanding.

10 Minute Blocks.  Currently 9 minutes.  But usually 7 minutes.

Everyone talks about a block every 10 minutes, but that’s the long-term mean.  Spikes in exchange rates are followed fairly closely by spikes in network hashrate, and ASIC miners are ramping up to meet demand.  As difficulty adjustment happens every 2016 blocks (ideally 2 weeks), there’s a lag. Over the life of bitcoin, and over the last year the average is almost exactly 600 seconds, but over the last 3 months it’s been 520 seconds.  The last month is 542 seconds, so hashrate acceleration is slowing.

But a subtler effect is shown when we look at the median, rather than the mean: it’s just under 7 minutes.  This is because the time to hit the target hash is not a normal distribution at all.  There’s probably a fancy name for this spike with an exponential tail, but I’ve graphed here a recent set of 2016 blocks (fortnight 115) showing the distribution of block times in minute-wide buckets.

Now, these stats were using timestamps in the blocks, rather than the actual observed times, but I’m assuming on average that they’re correct.

Actually, 10.005 Minute Blocks

The bitcoin client calculates how long an interval took by subtracting the timestamp from beginning of the interval to the end of the interval of 2016 blocks.  There are 2015 spaces between 2016 blocks, but the code divides by 2016.  But I’m sure no one else cares about that 0.3 second mistake, since block times are never that precise anyway.

Politics In The Genesis Block.  Or Not.

It’s common to point to the text in the very first block “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks” as a political statement by Satoshi.  While I’m sure the headline amused the author, we need look no further than the initial Bitcoin Paper, section 3:

A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash, such as in a newspaper or Usenet post [2-5]. The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash.

In other words, it simply proves that there was no pre-mining going on.  It would be interesting to get an accurate timestamp of the initial release of bitcoin and examine London Times headlines around that date to see if it was cherry-picked, or happy coincidence.

Crazy Address Encoding

Bitcoin addresses are a 25-byte number.  It’s usually encoding using 58 characters (numbers and letters, omitting zero, capital I and O, lower-case l to avoid confusion). Dividing by 58 is a bit of a pain, but doing crypto means we have big number libraries lying around which we can use.

But it’s not the straight encoding one might expect, which would result in 37 character addresses.  You might expect that leading zeroes can be omitted for compactness, but in fact, leading whole zero bytes are encoded separately. This gives variable-length addresses of between 27 and 34 characters and a second loop to encode and decode them. https://en.bitcoin.it/wiki/Base58Check_encoding

Anonymity Off By Default

Anonymity is hard, but I was surprised to see blockchain.info’s page about my donation to Unfilter correctly geolocated to my home town!  Perhaps it’s a fluke, but I was taken aback by how clear it was.

CVEs in Bitcoin

Like any software, there have been flaws in the bitcoin reference client: obviously there has been a great deal of scrutiny and concern.  Unlike most projects, there is a superb wiki page which details each vulnerability, with consequences and deployment status across the network: https://en.bitcoin.it/wiki/Common_Vulnerabilities_and_Exposures.

Corrections welcome!
Rusty.

No tags Hide

http://www.oasis-open.org/committees/virtio

Over the last few years, interest in virtio has begun compounding.  FreeBSD have their bhyve implementation, there’s an MMIO bus and SCSI endpoint implementation, and I’ve been fielding more queries about various alternate implementations.  While it’s taken longer than I’d hoped, the effort hasn’t waned as I feared.

So I have carved out some time this year to turn this draft into a real, consensual standard with the trappings expected by those outside the normal Linux/KVM sphere (such as an IP policy). I know I said I’d never get involved in a standard process again after the FHS, but OASIS seems like the right umbrella to cleanly and efficiently run this effort.

There are limitations and workarounds in the current draft and implementations.  None are fatal, but they make a case for a flag day change for 1.0 (with backwards compatibility possible for implementations which want that).  More compelling, to me, is the chance for other vendors to get involved now and have their voices heard: after the standard is finalized, they’ll just have to follow along.

I look forward to polishing what we have, and making sure we can implement even more awesome things in future.

No tags Hide

Last week I used 2 BTC to support Jupiter Broadcasting’s Unfilter show (and their other shows, but only Unfilter takes BTC so far).  Just now I noticed that someone made a 0.5BTC donation to my blog (I’ve had a BTC donation address in the sidebar of my blog for a few years now).  Thanks!

As I promised to pass donations onwards, I googled for bitcoin donations, and chose the following places to give 0.05 BTC each:

  1. Juice Rap News for making high-baud political commentary (Unfilter in rap form)
  2. Freedom Box for actually doing something about Internet freedom.
  3. Torservers.net (as recommended by torproject.org) for the same.
  4. f-droid.org for keeping a healthy Open alternative.
  5. Bitcoin Foundation to support and strengthen the infrastructure that made this possible.
  6. The Free Software Foundation even though I don’t always agree with them.
  7. Wikileaks for recognizing something society needs, even if they stumble at delivery.
  8. The Internet Archive for something that only gets more useful over time.

There are two left to go, so I’ll keep an eye out for more opportunities to donate in the next few weeks…

-0.05

No tags Hide

With the imminent release of gcc 4.8, GCC has finally switched to C++ as the implementation language.  As usual, LWN has excellent coverage.  Those with long memories will remember Linux trying to use g++ back in 1992 and retreating in horror at the larger, slower code.  The main benefit was stricter typechecking, particularly for enums (a great idea: I had -Wstrict-enum patches for gcc about 12 years ago, which was a superset of the -Wenum-compare we have now, but never got it merged).

With this in mind, and Ian Taylor’s bold assertion that “The C subset of C++ is as efficient as C”, I wanted to test what had changed with some actual measurements.  So I grabbed gcc 4.7.2 (the last release which could do this), and built it with C and C++ compilers:

  1. ../gcc-4.7.2/configure –prefix=/usr/local/gcc-c –disable-bootstrap –enable-languages=c,c++ –disable-multiarch –disable-multilib
  2. ../gcc-4.7.2/configure –prefix=/usr/local/gcc-cxx –disable-bootstrap –enable-languages=c,c++ –disable-multiarch –disable-multilib –enable-build-with-cxx

The C++-compiled binaries are slightly larger, though that’s mostly debug info:

  1. -rwxr-xr-x 3 rusty rusty 1886551 Mar 18 17:13 /usr/local/gcc-c/bin/gcc
    text       data        bss        dec        hex    filename
    552530       3752       6888     563170      897e2    /usr/local/gcc-c/bin/gcc
  2. -rwxr-xr-x 3 rusty rusty 1956593 Mar 18 17:13 /usr/local/gcc-cxx/bin/gcc
    text       data        bss        dec        hex    filename
    552731       3760       7176     563667      899d3    /usr/local/gcc-cxx/bin/gcc

Then I used them both to compile a clean Linux kernel 10 times:

  1. for i in `seq 10`; do time make -s CC=/usr/local/gcc-c/bin/gcc 2>/dev/null; make -s clean; done
  2. for i in `seq 10`; do time make -s CC=/usr/local/gcc-cxx/bin/gcc 2>/dev/null; make -s clean; done

Using stats –trim-outliers, which throws away best and worse, and we have the times for the remaining 8:

  1. real    14m24.359000-35.107000(25.1521+/-0.62)s
    user    12m50.468000-52.576000(50.912+/-0.23)s
    sys    1m24.921000-27.465000(25.795+/-0.31)s
  2. real    14m27.148000-29.635000(27.8895+/-0.78)s
    user    12m50.428000-52.852000(51.956+/-0.7)s
    sys    1m26.597000-29.274000(27.863+/-0.66)s

So the C++-compiled binaries are measurably slower, though not noticably: it’s about 865 seconds vs 868 seconds, or about .3%.  Even if a kernel compile spends half its time linking, statting, etc, that’s under 1% slowdown.

And it’s perfectly explicable by the larger executable size.  If we strip all the gcc binaries, and do another 10 runs of each (… flash forward to the next day.. oops, powerfail, make that 2 days later):

  1. real    14m24.659000-33.435000(26.1196+/-0.65)s
    user    12m50.032000-57.701000(50.9755+/-0.36)s
    sys    1m26.057000-28.406000(26.863+/-0.36)s
  2. real    14m26.811000-29.284000(27.1308+/-0.17)s
    user    12m51.428000-52.696000(52.156+/-0.39)s
    sys    1m26.157000-27.973000(26.869+/-0.41)s

Now the difference is 0.1%, pretty much in the noise.

Summary: so whether you like C++ or not, the performance argument is moot.

No tags Hide

This year’s organizers took specific pains to attract deep content, and the schedule reflects that: there are very few slots where I’m not torn between two topics.  This will be great fun!

After a little introspection, I did not submit a talk this year.  My work in 2012 was with Linaro helping with KVM on ARM: that topic is better addressed by Christoffer Dall, so I convinced him to submit (unfortunately, he withdrew as January became an untenable time for him to travel).  My other coding work was incremental, not revolutionary: module signatures, CCAN nor ntdb shook the ground this year.  There just wasn’t anything I was excited about: a reliable litmus test.

See you at LCA!

No tags Hide

There were some great comments on my previous post, both in comments here and on the Google Plus post which points to it.  I’d like to address the point here, now I’ve had a few moments to do follow-up work.

One anonymous commenter, as well as Stephen Hemminger via email, point to the existing lockless queue code in liburcu.  I had actually waded through this before (I say waded, because it’s wrapped in a few layers which I find annoying; there’s a reason I write little CCAN modules).  It’s clever and genuinely lockless; my hat off to , but it only works in conjunction with RCU.  In particular, it’s an unlimited-length queue which uses a dummy element to avoid ever being empty, and the fact that it can safely traverse the ‘->next’ entry even as an element is being dequeued, because the rules say you can’t alter that field or free the object until later.

Stephen also pointed me to Kip Macy’s buf_ring from FreeBSD; it uses two producer counters, prod_head and prod_tail.  The consumer looks at prod_tail as usual, the producers compare and swap increment prod_head, then place their element, then wait for prod_tail to catch up with prod_head before incrementing prod_tail.  Reimplementing this in my code showed it to be slower than the lower-bit-to-lock case for my benchmarks, though not much (the main difference is in the single-producer-using-muliple-producer-safe-routines, which are the first three benchmarks).  I ignored the buf_ring consumer, which uses a similar two-counter scheme for consumers, which is only useful for debugging, and used the same consumer code as before.

Arjen van de Ven makes several excellent points.  Firstly, that transaction-style features may allow efficient lock-ellision in upcoming Intel CPUs (and, of course, PowerPC has announced transaction support for Power8), so we’ll have to revisit in a few years when that reaches me.

His more immediate point is thatuncontended locks are really cheap on recent CPUs; cheaper than cache-hot compare-and-swap operations.  All the benchmarks I did involve everyone banging on the queue all the time, so I’m only measuring the contended cases.  So I hacked my benchmarks to allow for “0 consumers” by having the producer discard all the queue contents every time it filled.  Similarly, filling the queue with junk when it’s empty for a “0 producers” benchmark.

Here we can see that the dumb, single lock comes into its own, being twice as fast as my optimal-when-contended version.  If we just consider the common case of a single writer and a single reader, the lockless implementation takes 24ns in the contended case, and 14ns in the uncontended cases, whereas the naive locked implementation takes 107ns in the contended case and 7ns in the uncontended case.  In other words, you’d have to be uncontended over 90% of the time to win.  That can’t happen in a naive implementation which wakes the consumer as soon as the first item has been inserted into the queue (and if you implement a batch version of queue_insert, the atomic exchange gets amortized, so it gets harder to beat).

For the moment, I’m sticking with the previous winner; there’s still much to do to turn it into a usable API.

No tags Hide

« Previous Page« Previous Entries

Next Entries »Next Page »

Find it!

Theme Design by devolux.org

Tag Cloud