Rusty Russell's Coding Blog | Stealing From Smart People



License Boilerplates

CCAN is supposed to be about the code, so I’ve avoided the standard GPL boilerplate comment at the top of each source file.  I reluctantly include a symlink to the full license text in each directory now, since lawyers approached me to clarify the single “License:” line in _info.  A useful discussion on the samba-technical mailing list has reinforced my view that it’s marginal clutter, but most CCAN modules now have a one-line courtesy comment such as “/* Licensed under LGPLv2.1+ – see LICENSE file for details */” at the top of each .c and .h file.

Please make a conscious choice here: if license enforcement is a high priority for your project you probably want copyright assignments, license boilerplates and click-through agreements for everyone who downloads your source code.  But if you’re spending significant time or effort on legal issues for your little coding project, you’re probably doing it wrong…

(ccanlint now scans for common license boilerplates, as well as those comments; this means we can also detect use of incompatible licenses inside modules, or dependent modules.  The former test noticed that I’d labelled the md4 module as LGPL, yet it’s actually GPL.  The latter spotted that ccan/likely (LGPL) depends on ccan/htable (GPL): legal (the whole thing is actually GPL), but misleading, as Michael Adam noted).  Automating this stuff is a clear win for a project like CCAN.  I also re-licensed a bunch of useful-but-trivial modules from LGPL to public domain, as I want the BSD modules to use them).

No tags Hide



Bitcoin for something useful.

I wanted a scalable version of this poster: so I offered 1BTC on the bitcoin forum and someone produced a version I can print and put on my wall in my home office.

Here is the SVG. Enjoy!.

No tags Hide

I like bitcoins.  A simple open source client, a well-run developer community, clever algorithms, decentralized assurance model, and of course near-zero transaction fees.  For all the economic arguments (some of which sound like early anti-Wikipedia arguments, though I hesitate to argue by analogy), when I first used it to tip a website, I fell in love.  It took me about 3 minutes to transfer $50 from my bank account to a stranger’s via my bank’s website, with 2 days latency.  It took me about 5 seconds to send 0.1BTC to a stranger, with about 10 minute latency.  It was a revelation.

But, while bitcoins rock, volatility doesn’t.  It’s currently a bit hard to get some bitcoins, and the price has rocketed by speculation.  It’s not just a cute FOSS project, it’s becoming a real market, and surely those piling in now are susceptible to hacking, scams and the inevitable hiccups that go with any project ramping up.  It’s all going to come crashing down to earth.

Good.  My hope is that after the GBC the speculators will move on.  Volatility will settle.  The boring work of accreting trust in this new tool will continue.  I fervently hope that it we will appreciate its true utility once the dust has settled and the Man Loses A Million Dollars and House in Virtual Currency headlines have faded.

[Yes, you can tip me in bitcoins!  I think it’s a good habit, fun to do, and better than ads as I get a warm fuzzy feeling of appreciation.  I’ll be using any tips I get to tip other sites which take BTC].

No tags Hide

This was passed on to me by Ben Elliston, ex-gcc hacker and good guy.  Amusing in context, but the corollary is that working on free software means you’ll encounter such people.  You may have to work with them.  You may have to argue with them (and they may be right).

Quite some time ago I was horrified by the private behaviour of a hacker I deeply respected: malicious, hypocritical stuff.  And it caused an internal crisis for me: I thought we were all striving together to make the world a better place.  Here are the results I finally derived:

  1. Being a great hacker does not imbue moral or ethical characteristics.
  2. Being a great coder doesn’t mean you’re not a crackpot.
  3. Working on a great project doesn’t mean you share my motivations about it.

This wasn’t obvious to me, and it seems it’s not obvious to others.  A-list actors endorsing Scientology doesn’t make it a good idea.  Great FOSS political work was done by a certain obnoxious LWN-haunting nutball.  Julian Assange may or may not be guilty of crimes in Sweden. Many of my kernel coworkers believed that GPLv3 was somehow a radical change from GPLv2.  Some sweet code has been written by gun nuts, lechers, holocaust deniers and (in at least once case) someone who believes that fasting will cure cancer.

In any walk of life you have to work with all kinds; having to do so in my dream job as FOSS hacker was a hard lesson for me.  It’s great to work with people whose skills you respect, but don’t expect to like them all.

No tags Hide

I was delighted that Jon Corbet pinged me to say he was finally implementing a supporter option for LWN.  It’s been about 12 months since I started asking about it, and 6 since I started asking publicly.  When it finally arrived, in classical FOSS brand-suicide style, it was named the “Maniacal supporter” option.  I don’t think Jon believed anyone would actually pay more “for nothing”, but curiosity finally won out.

But he’s wrong: people want the consistent commentry and in-depth analysis that only dedicated experts like Jon can provide.  And we know that if they don’t get enough money, they’ll have to stop writing and take day jobs; this is not some abstract charity.  I want Jon to be comfortable and LWN financially secure and able to concentrate on what they do best, which seems to be a rare skill in our community.  This is a start in that direction; I welcome your suggestions on what to do next…

No tags Hide

Jokes aside, I don’t prepare my conference talks the night before.  I took a week off of work to prepare my talk this year (two weeks before the conference, and I still spent a couple of work days in the week after completing it).  That kind of spontaneity takes preparation!

Here’s a rough calculator of preparation time for an LCA-style talk.  Make sure you finish at least a week before, to allow slippage!

Preparation Time for Standard Talk (~45 minutes)

If you have given it before:

  • If you were happy with it and not changing it: 15 minutes to re-check, change conference name and date.
  • If you were happy with it and changing it slightly: +1 hour for a complete run-through.
  • If you were a little unhappy with it, but content will not change: +5 hours for reviewing previous video and googling for feedback and taking notes, then running through changed sections twice and complete run-through once.

Prior work:

  • If you’re not the world expert on what you’re talking about, allow at least a week of research time.
  • If the topic is vague allow at least a month of mulling time, where the topic sits in your brain.  For longer periods I recommend jotting down your ideas.  (I did this for an entire year before my OLS keynote, and I knitted a theme from the contents of that text file full of thoughts and examples).
  • One hour to a day to plan your talk structure: what are your main points, what’s the extra magic?

Writing the talk:

  • 10 minutes per basic slide.  Usually I’d expect 25 slides, so say 4 hours.
  • 30 minutes per diagram (five minutes of this will be trying to figure out if you really need a diagram: you probably do!).  I’d expect five to ten diagrams, so say 5 hours.
  • Five hours per demo.  Not just setting it up in the first place, but making it robust and repeatable and practicing switching to and from your presentation adds time.
  • Two hours per run-through (since you tend to stop and mod the first few times).  You’ll want at least one more of these than you have time for, but I’d expect 8 hours for this.

Additional overheads:

  • Using new software: +4 hours if you’re on your own, +1 hour if you have an expert available.
  • Any project work where you have to document the steps for your talk: double your normal project time for the overhead, to ensure it’s comprehensible and maximally useful to the audience (vs. works).
  • Any new presentation technique you haven’t used before, add 4 hours for two additional run-throughs.

Preparation Time for A Tutorial (~1.5 hours)

Similar calculations, but you’ll have many more demos so it’ll be more than twice as long.  The real killer is that you have to practice timings with real people who are similar to your target audience.  This means in addition to everything else, you’ll want to give it for some local group at least twice, because there’s no way you can know what the timing will be like until you’ve done that.  Say +2 hours to organize the practice sessions, and +6 hours to run them (this includes transport, setup, testing and time overruns for each one).

Testing Time

Testing time happens standing in the venue, at the podium with your setup ready to go.  I allow 5 minutes for video setup.  If you’ve not presented on the laptop before, +15 minutes.  If you’re not always running 1024×768, +10 minutes.  If you want audio, +5 minutes. If you have a video to show, +5 minutes. If you have an interactive demo, +5 minutes to find a practice volunteer, +20 minutes to run through the demo with them.

In general, allow half your testing time the day before (ie. you’ll need to access the venue), the rest in the space before your talk.

An Example

So, this gives a preparation time for my LCA 2011 talk as:

  • 1 day planning.
  • 6 hours for 35 basic slides.
  • 2 hours for 4 diagrams.
  • 15 hours for 3 demos.
  • 8 hours for run-throughs.
  • 4 hours for messing with svgslides, even though I didn’t really use it in the end.
  • 3 days for coding up the example project, and documenting that code.
  • 4 hours for additional run-throughs because I hadn’t presented using a side-bar and emacs before.

Giving a total time of of 71 hours (assuming 8 hour days).  That’s probably about right.  And the required 30+ minutes of testing time explains why I didn’t end up having people telnet into my laptop for the demos; if I’d tested that the day before, we might have been able to organize something.

No tags Hide

Perhaps there was too much fun, and not enough advanced C coding, as one attendee implied.  My original intent is to walk through a real implementation in the order I coded it, warts and all, but over 50% got cut for time.  After all, it took me 15 minutes in my BoF session just to run through the implementation of ccan/foreach.  (Hi to the three people who attended!).

So I ended up doing a fair bit of waving at other code (yes, mainly in CCAN: if I have a useful trick, I tend to put it there).  Here’s the bullet-point version of my talk with links:

  • CCAN is a CPAN-wannabe project for snippets of C code.
  • Your headers should be a readable and complete reference on your API.
  • Code documentation should be human readable and machine processable (eg. kerneldoc), but extracting it is a waste of time.  See above.
  • Your headers should contain example code, and this should be compile tested and even executed (ccanlint does this).
  • Perl’s TAP (Test Anything Protocol) has a C implementation which is easy to use.
  • You can write a better ARRAY_SIZE(arr) macro than “sizeof(arr)/sizeof((arr)[0])”, using gcc extensions to warn if the argument is actually a pointer, not an array.
  • I got bitten by strcmp()’s usually-wrong return value after coding in C for ten years.  I suggest defining a streq() macro.
  • It is possible, though quite difficult, to implement a fixed-values iterator macro, aka. foreach.  It’s even efficient if you have C99.
  • Making functions return false rather than exit, even if the caller can’t really handle the failure, makes for easier testing.
  • Making your functions use errno is a bonus, though its semantic limitations are definitely a two-edged sword.
  • A common mistake is to call close, fclose, unlink or free in error paths, not realizing that they can alter errno even if they succeed.
  • Never think to write malloc-fail-proof code without testing it thoroughly, otherwise you haven’t written malloc-fail-proof code.
  • You can test such “never-happen” failure paths automatically by forking; make sure you give a nice way to get a debugger to the fail point though, and terminate failing tests as early as possible.
  • There are libraries to make option parsing easier than getopt; popt and ccan/opt are two.
  • You can use macros to provide typesafe callbacks rather than forcing callbacks to take void * and cast internally; the compiler will warn you if you change the type of the callback or callback parameter so they no longer match.
  • Do not rely on the user to provide zero’d terminators to tables: use a non-zero value so you’re much more likely to catch a missing terminator.
  • Use talloc for allocation.
  • Don’t return a void * as a handle, even if you have to make up a type.  Your callers’ code will be more typesafe that way.
  • Don’t use global variables in routines unless it’s clearly a global requirement: keep everything in the handle pointer.
  • Valgrind is awesome.  Valgrind with failtesting is invaluable for finding use-after-free and similar exit-path bugs.
  • Fixing a test doesn’t mean your program doesn’t suck.  I “fixed” a one-client-dies-while-another-is-talking-to-it by grabbing another client; that’s stupid, though my test now passes.
  • Don’t do anything in a signal hander; write to a nonblocking pipe and handle it in your event loop.
  • The best way to see why your program is getting larger over time is to use talloc_report() and see your allocation tree (you can use gdb if you need, a-la Carl Worth.
  • You might want to do something time-consuming like that in a child; remember to use _exit() in the child to avoid side-effects.
  • There are at least two tools which help you dump and restore C structures: genstruct and cdump (coming soon, it’s in the talk’s git tree for the moment).  Both are very limited, though cdump is still being developed.
  • You can use a dump/exec/restore pattern to live-upgrade processes; forking a child to test dump and restore is recommended here!
  • If your restore code is well-defined for restoring fields that weren’t dumped, you can make significant code modifications using this pattern.
  • You can use C as a scripting language with a little boilerplate.  Use “#if 0″ as the first line, followed by the code to recompile and exec, then “#else” followed by  the actual code.  Make it executable, and the shell will do the right thing.
  • You can use gdb to do just about anything to a running program; script it if you can’t afford to have it stopped for long.
  • The best hash algorithm to use is the Jenkins lookup3 hash (there’s a ccan/hash convenient wrapper too).
  • The best map/variable array algorithm to use is Judy arrays (much nicer with the ccan/jmap wrapper).

That was all I had room for; there was none for questions, and even the last two points were squished onto the final “Questions?” slide.

No tags Hide



The LCA Dinner Rule

Dinner was great as only a room full of well-fed FOSS geeks can be, but I felt that that my time on-stage was too long and too chaotic; I apologise.  The LCA team have done such a thorough job of recovering from events that it’s a surprise when things don’t magically come together.

So I propose an “LCA Dinner Rule” for future conferences: that dinner activities span no more than 1 hour.  Preferably combining wit with its brevity.

(Oh, and I have a strong feeling that next year’s dinner is going to be a superb event…)

No tags Hide

Putting “Advanced” in the title did not have the desired effect of scaring people off.  Nonetheless, it went well: maybe because I was on the wired network so noone could access my server to find the bugs :)

There wasn’t anywhere obvious to place a link about my talk, so here’s the git repository.

No tags Hide



On C Linked Lists

There are two basic styles of double-linked lists in C; I’ll call them ring style and linear style.

The Linux kernel has ring-style, defined in include/linux/list.h.  You declare a ‘struct list_head’ and everyone who wants to be in the list puts a ‘struct list_head list’ in their structure (struct list_node in CCAN’s version).  This forms a ring of pointers in both the forward and reverse directions.

SAMBA uses linear-style, defined in lib/util/dlinklist.h.  You simply declare a ‘struct foo *’ as your list, and everyone who wants to be in the list puts a ‘struct foo *prev, *next’ in their structure.  This forms a NULL-terminated list in the forward direction (the reverse direction became a ring recently to facilitate tail access).  (Linux’s hlist.h datastructure is similar, only done in list.h style).

Points in favor of ring-style lists:

  1. Insertion and deletion are branchless, as the elements and head are homogeneous:
     head->next->prev = new;
     new->next = head->next;
     new->prev = head;
     head->next = new;


    if (!head) {
        new->prev = head = new;
        new->next = NULL;
    } else {
        new->prev = head->prev;
        head->prev = new;
        new->next = head;
        head = new;
  2. list_add, list_del are inline functions, not macros, since the types are known.

Points in favor of linear-style lists:

  1. They’re typesafe, since the head pointer and internal pointers are all the correct type.
  2. Forward iteration is simpler, since the list ends at NULL rather than back at the head. In theory this could free a register, but the bigger difference is that it’s often useful to have NULL in your iterator once the loop is done.
  3. As a corollary, iteration, initialization and emptiness testing don’t need some tricky macros:
      struct foo *head = NULL;
      if (head == NULL) ...
      for (i = head; i; i = i->next) ...


      if (list_empty(&head)) ...
      list_for_each(i, head, list) ...
  4. Uses slightly less space for the head pointer (one pointer, vs two).

So how important is efficiency of insert and delete in a real project?  To provide some data on this, I first annotate the linux kernel so each list.h operation would increment a counter which I could dump out every so often.  Then I booted the kernel on my laptop and ran as normal for three days.

Operation Frequency
Empty Test 45%
Delete 25%
Add 23%
Iterate Start 3.5%
Iterate Next 2.5%
Replace 0.76%
Other Manipulation 0.0072%

Firstly, I was surprised that we add and remove from lists much more than we look through them. On reflection, this makes sense: if lookups are really common we don’t use a linked list. And note how often we check for whether the list is empty: this looks like a “list as a slow path” pattern. I wonder if SAMBA (or other projects) list usage looks the same… anyone?

Secondly, we delete more than we add, but we could be “deleting” initialized-but-unadded elements. I didn’t chase this beyond re-reading and enhancing my patch to measure “other manipulations” to see if they could explain it (no).

Thirdly, when we iterate it’s short: the list is empty or we break out on the first element 28% of the time (I didn’t differentiate). I wonder if a single-linked list would be an interesting micro-space-optimization for many of these lists.

Summary: I like the safety of SAMBA’s lists, but there’s clearly real merit in eliminating those branches for add and delete.  It’s a genuine performance-usability trade-off, so I I think we’ll still have both in CCAN…

No tags Hide

« Previous Page« Previous Entries

Next Entries »Next Page »

Find it!

Theme Design by

Tag Cloud