I’ve been quietly lobbying Linux Weekly News to offer a $500 – $1000 “Supporter”-level subscription option; LWN has saved my sanity for the last two years as my conference schedule has been severely restricted.
But one problem with LWN is that they spend all their time writing and editing quality articles, instead of the boring stuff: ie. taking money from people. The base rate finally increased this year after 8 years at $5/month (to $7), but I worry that Jon and co will look back at the years of service and lack of retirement savings and wish they’d had secure well-paying jobs like the rest of us.
After numerous monthly reminder emails and face-to-face lobbying at LinuxCon Japan, I’m giving up on “quietly” and asking your for your help. Please: next time you comment on LWN, could you add some sig-line lobbying for the Supporter-level subscription? Even if you have no intention of ever buying such a thing, at least you know one person will…
I recently came across the Judy library which seems like an impressive set of data structures to implement mappings (think variable length arrays or hash indexed by simple key). I also looked through the codec2 sources recently, and I saw some of the same API issues, so I thought I’d crystalize some of them here:
- Packaging code libraries is a pain. There’s an obscene amount of clutter around the code itself. Some can be blamed on autoconf, but common practices like putting the source under src/ make it worse. In Judy there are 29 actual core code files, but 33 infrastructure files (automake et al) and AUTHORS, ChangeLog, INSTALL, COPYING and 13 README files.
- Typedefs which are always the same are silly. Don’t typedef long or void * or const void *, you’re confusing documentation with real typesafety. Undefined structs are your friend. If your functions take a context, make it a “struct mylib” (or union) and don’t expose the definition in the public header. Your code will be clearer and more typesafe and so will the library users’.
- Context creation and destruction are common patterns, so stick with “mylib_new()” and “mylib_free()” and everyone should understand what to do. There’s plenty of bikeshedding over the names, but these are the shortest ones with clear overtones to the user.
- If your library needs initialization use “mylib_init()” or “mylib_setup()”.
- Try to make your API impossible to misuse (ie. no mylib_init() at all if you can manage it), but if that fails assert(), abort() or crash. You are not doing developers a favor by making misuse easy to ignore. If you worry about crashing on users, and can handle an API error gracefully, put that handling under #ifdef NDEBUG, like assert does. But this is not where you should be spending your development effort!
- If your only possible error is out-of-memory, consider not handling it at all. It’s pretty certain none of your library users will handle it (remember Tridge’s Law: untested code is broken code). If it’s convenient to return NULL, do so. Otherwise offer a way of registering an error function and let the user deal with setjmp etc. Do not complicate your API to handle this case.
- Always return the error. If you return a pointer, NULL is standard. Magic pointers for different errors can work too, but at that point some other method might be preferable.
- Think hard before implementing multiple levels of interface. It’s better to have everyone understand your interface and have 10% not use it (or hack it!) because it’s incomplete, than have 50% reinvent it because yours is too complicated.
- There’s a standard naming for “I know what I’m doing” low-level alternate functions: the single-underscore prefix (eg. _exit()). That’s your one extra level of interface, and if you need more see the previous point. Perhaps another library is in order, on top of this one? This is a standard violation: see Glen’s comment below.
- Don’t worry about namespace issues too much. Avoid the standard namespace and settle on a short prefix (eg. opt_, judy_, talloc_), but there’s little you can do if your name hits someone else’s so don’t make your code a pain to use just to try to uniquify the names (ccan_rusty_russell_opt_ prefixes, for example).
- Your header(s) should be readable, and in a readable order. I consider extracting comments a complete waste of time, but kerneldoc and doxgen et. al. provide a consistent model for documenting your API in the headers, and should be used as such.
- Don’t sweat portability too much. Try not to paint yourself into a corner, but if you’re not going to test on other platforms leave the serious effort until someone does.
- I do consider use of C++ keywords in headers overly unfriendly to our C++ cousins, but in the code it’s fine; I avoid putting extern “C” in my headers, as C++ people can do that (and whatever else they need) before #including the header. Again, if you’re not going to test it, don’t work up a sweat about it.
- There’s a standard naming scheme for C, and it’s all lower case with underscores as separators. Don’t BumpyCaps.
On some level, each of these influenced CCAN as we thought “how do we encourage people to reuse this code?” Many are specific applications of the CCAN Golden Rule (aka. “our code should not be ugly”) which drives that attempt to make code reusable.
Packaging cruft in CCAN is sidestepped entirely; since people are expected to include ccan/ in their own code, we leave that to them, with some guidelines on how to use the code:
- You supply a “config.h” somewhere we can #include it, with various #define HAVE_FOO in it.
- The _info file is actually a C program: the header contains (trivial-to-parse kerneldoc style) metainformation such as author, description, and example program(s). These are aimed at human consumption: compiling and executing the code gives dependencies in a parsable form (still, only the ccan/ dependencies are the only ones really amenable to machine use).
- The (main?) header will be the same name as the module, with “.h” appended, eg “ccan/foo/foo.h”. The kerneldoc-style comments are the module API documentation.
- The test/ subdir contains tests of standard forms (eg. run-* test are to be executed).
- DEBUG can be defined for “give me extra debugging on this module, performance cost be damned”. This is separate from normal sanity checks which would be disabled by defining NDEBUG.
In other words, you see the code first and foremost when you look in a CCAN module.
We also have a “namespacize” tool (very primitive, needs love) which is designed to rewrite the code in case of namespace clashes (currently it prepends ccan_, and scans dependent modules and fixes them up too). This allows us to ignore namespace issues :)
The “ccanlint” tool is CCAN’s always-too-primitive checker/scorer tool, which ideally would judge a module on all of the complaints above. It currently runs the test cases, checks their coverage and the like, but of course it’ll never be sophisticated enough to judge interfaces. However, it does check for the existence of kerneldoc comments, and even extract and compile Example: code fragments (using some fun heuristics) to make sure they haven’t bitrotted. And I keep adding new checks which make it more useful.
This probably explains why I stay up late writing CCAN modules: I can concentrate on writing the code (and tests!) without the other cruft. And I don’t even need to stick to an API or ABI for my modules (api* tests indicate an API guarantee, and only three modules have those!). And because it’s source-level, you don’t have to stretch your interface to cover all the cases; users with weird needs can simply use the code as a basis for their own.
IBM are involved with the Genographic Project, which is an attempt to “chart new knowledge about the migratory history of the human species”, by cheekswabbing hundreds of thousands of people around the world. You get swabbed, you get an id to access your history, and they get the consolidated data from the results.
My initial reaction was that I wasn’t personally interested, but I am interested in the project overall, so when I was told they were doing an event in Adelaide on Friday at the Central Markets, I decided I’d go along and get swabbed (or have Arabella do it, since she’ll be with me).
The deal is that the first 100 volunteers can be tested for free, so if you’re interested now is your chance!
There are so many option-parsing libraries around, but I really wanted something simple and yet powerful enough for the popt use cases. Particularly, I wanted it avoid the getopt() loop and just use typesafe callbacks (yeah, I have a hammer…) so it could be arbitrarily extended by the user. The result is the latest CCAN module, “opt“, which is very much popt-inspired (popt-lite?).
Beware the API which isn’t used! It originally had a popt-style struct table, with first entry being the long option (or NULL), and the second being the short option (or ‘\0′). While writing tests, I used “–foobar” as the long option, instead of (the correct) “foobar”. If I can’t use my own API, what hope is there?
So I rewrote it: the option name is now a /-separated string, like “–foobar/-f”. This is much more intuitive, more grep-friendly, and gives us obvious aliases. Its format is also checked when you register the option or option table.
Another subtlety worth mentioning: the end-of-table-marker isn’t a zero entry. It’s too easy to miss that and have your code still “work” because the next patch of memory happens to be zeroed. On your platform.
I converted (the admittedly-trivial) ccanlint from getopt() to ccan/opt; it gained long options, as well as losing 24 lines (16 of which was losing the hand-rolled usage() function.
It’s not worth converting existing popt code IMHO (though I’d be interested in the feedback!), but if you’re still using getopt/getopt_long, you should probably take a look.
After my previous conclusion that doubling linear hash tables were good enough, it turns out that they don’t work for TDB: enlarging or moving the hash table in the presence of tdb_chainlock is not possible. So, I ended up implementing expanding hash tables, and ran into a familiar set of problems.
The design is simple; we start with a simple hash table. When it gets too full, we expand a bucket to point to a sub-hashtable, which we similarly expand to sub-tables when it fills. Seems like a nice logarithmic expanding data structure that means if we use 8 bit hash tables (ie. 256 entries) we’ll fit well over a billion entries in a 4 level deep set of tables.
Yet as we discovered previously, the hash density is horrible. To see why, imagine our entries hit the hash buckets in order: the first entry belongs in hash bucket 0, the next in hash bucket 1… it’s all good for the first 256 entries, and our single top-level hash table ends up with 100% occupancy. However, the 257th entry collides in bucket 0, causing it to expand into a 256-entry subtable which only has two entries in it. By the time we have 512 entries, we’ve got a top-level hash table full of pointers to 256 sub-hash tables; that’s a total of 68352 hash buckets holding a mere 512 entries. That’s 0.75% density, or an overhead just over 1k per entry!
In practice our arrangement is not perfectly even, so we don’t get as low as 0.75%, but we certainly don’t get up to 100% either. Our density wavers between a minimum of 1.3% and a maximum of 4.5% (I simulated this; if someone has pointers to a good “Hash Statistics For Beginners” course please tell me!). Our average density sits roughly halfway between those two, so I’ll be quoting min and max values from here in.
So, how do we improve this? The first thing I tried is to organize our hash into groups: we only expand when we fill an entire group.. Say we use 8-entry groups; if your bucket is full, you try the next one in the group. If all 8 are full, then you expand all the buckets into 8 new sub-tables. At expense of a slightly longer search time (we now have to search up to 8 buckets at the bottom level, rather than just 1), this means we tend to be fuller before we expand. This raises the maximum density to a respectable 25.3%, though the minimum drops to 0.6%. Still, our average density has jumped from 3% to 13%.
So, what if we don’t expand all 8 buckets at once, but expand the most popular one in the group? Our minimum density rises to 1.5%, and our maximum to 33%; closer to a 17% average. Unfortunately calculating the most popular bucket means re-calculating the hash of all entries in the group. With the expand-all approach, we needed to recalculate hashes for 8 entries to decide into which subtable each entry should be inserted. With the expand-one approach, we calculate hashes for 8 entries, then 7 next time, then 6… 36 calculations (ie. O(N^2)).
We do have the hash value of the current entry: what if we simply expand the bucket that belongs in? It may not be the fullest bucket, but we know it’s not going to be empty. This gives a close-to-optimal result, with densities ranging from 1.2% to 31%.
Now note that with all these approaches, expanding a bucket to point to a subtable means messing with a full group: some entries may go down into the subtable, others may move within group. So we need to re-calculate the hashes anyway unless we store some extra information in the hash bucket somewhere. For TDB2, we don’t really need 64 bits for entries: 64 exabytes should be enough for anyone, so the top 8 bits are free, and all records are 8-byte aligned so the bottom 3 bits are free.
There are two approaches which seem useful: store the “home bucket” in unused bits in the offset (3 bits) which means we never have to rehash when we expand a bucket (unless moving a hash entry down a level), or store a single “happy where I am” bit (around half of entries in full groups are in exactly the right location, so knowing this saves us half our rehash calculations). It depends on how many bits we want to spend.
But 15% average density still sucks. If we increase the group size it slows down searches a little and requires more rehashing on expansion or more bits to indicate the home bucket. For example, 32-entry groups using “create subtable where we want to insert the new entry” gets us 1.2% to 60% density. Yet that means storing the “home bucket” would take 5 bits.
That minimum density is still a worry. The obvious change is to reduce our subtable size: say 64, not 256 entries for sub-levels. Of course, our tree becomes deeper, making us less efficient, so we don’t really want to do that.
Is there anything else we can do to increase that minimum density? The answer is yes, but it’s not free: we can share subtables. For example, when a group fills, we expand all group entries to point at the same (new) subtable. When a group in that subtable fills, we split the subtable. In effect, we expand a table sideways before we expand down a level.
In the simplest case, we split the shared subtable back into one-subtable-per-bucket immediately when any group fills. Going back to our 256-entry hash tables with groups of 32, this gives a density range of 2.6% to 61%. This has raised the minimum density by a factor of 2. But that expansion from a single shared subtable to 32 subtables is costing us, so I tried an alternative where instead of unsharing immediately, we split into two. So we start with all 32 buckets in the group pointing at the same subtable, then split so 16 buckets are pointing at each of two subtables, until eventually each one has a unique subtable, which then expands down. The result is a density range from 13% to 61%. This last result is ten times the minimum on our unshared 32-entry groups.
Of course, unsharing has costs as well: again, we need to rehash to figure out which of the subtables each entry belong in. And again, doing it all at once is cheaper than doing it five times. And once more, we can use 5 additional bits to say which entry of the group above this entry belongs to. Once we’re in an unshared subtable however, these bits are useless (and on average we will be in an unshared subtable half the time). That’s a fair cost for something which has marginal effect on our average density; it just helps our worst-case.
This entire decision comes down to expansion cost. This is not the same as insertion cost; not all insertions cause expansion, and if the system is in steady state (we don’t ever shrink the hash) expansion is rare. We need a model for our usage to know how often we expand; we also need to know how often we delete, insert and search (both successful and failing searches).
To wrap up, here is how we can use any extra bits in our hash entries:
|1||Subtable Bit||Bucket points to subtable, not an actual entry.|
|Any||Extra Hash Bits||Avoids lookups on hash collisions: each bit halves number of lookups. May avoid hash recalculation when pushing entries down into subtable.|
|1||Hash Truncated Bit||Indicates fewer/no Extra Hash Bits are valid. Otherwise we always need to rehash when we push entries down a sublevel, to fill in the Extra Hash bits. This way it can be done as required.|
|log2(groupsize)||Home Bucket||Avoid rehashing on expanding a bucket to a subtable,
for entries not going down to that subtable.
Helps searching on collisions.
|1||In Home Bucket||Alternative to above: avoids rehashing about 50% of buckets, and helps searching on collisions.|
|log2(groupsize)||Shared Bucket||Indicate which subtable we belong to in a shared subtable. With one of the two above, avoids rehashing when splitting shared subtable. Helps searching to a lesser extent (only on shared subtable).|
|1||Deleted Bit||Avoids reshuffle on delete, but may slow lookups a little.|
|1||Collision Bit||Indicates we should keep searching. Speeds lookups.|
Feedback welcome. If you want to play with my simulator, you can find it here.
 It also means that two identical hash values don’t create an infinite set of subtables. In practice, we need to use a linked list if we run out of hash bits anyway. My test code has an abort() in this case.
 With a hash size of 64, our minimum density rises to 5.8%, maximum 42% with 8-entry groups (with 32-entry groups the density range is 6% to 66%).
The Trivial DataBase (ccan variant here) uses fcntl locks for consistency: records are chained off a fixed-size hash table (or the free list), and a 1-byte fcntl lock at the offset of the chain head protects all records in that chain.
There’s also a tdb_lockall() function which grabs a lock across all the hash chains at once to let you do arbitrary atomic munging. This works because fcntl() locks have an offset and length: you can lock arbitrary byte ranges.
Unfortunately, tdb_lockall() is subject to starvation, at least under Linux. This is because the kernel merely checks whether a lock is available and gets it if it can, rather than queuing behind someone else who wants a superset of the lock. So single byte lockers come in and out while the whole-file locker waits for them all to go away.
I’m not sure this is wrong, and as it’s simpler than the alternative, I’m not prepared to change it just yet. Fortunately, there’s a way of avoiding this starvation in userspace, which occurred independently to both Volker Lendecke and me. I called this variant tdb_lockall_gradual(), in which we try to lock each chain one at a time so we compete with the single-chain lockers on fair terms. My first naive thought was to try to lock all the chains one at a time in order, nonblocking, then go back and retry (blocking) any we failed to get. This is slow, and can deadlock against another process doing the same thing. Volker’s suggestion was much nicer: we do a non-blocking lock, and if that fails we divide and conquer. If we get down to a single record, we do a blocking lock.
I wrote a test program which fires off N children, each of which grabs a random chain lock for 50-150 milliseconds before sleeping for 1 second, then repeating. The parent waits for a second, then tries to do a tdb_lockall() or tdb_lockall_gradual() depending on the commandline. Running it five times and showing the average wait time for the lockall gives this:
Now, regarding performance. While there can be 10000 hash chains, this isn’t as bad as it sounds. The fast case is the uncontended one, and that’s as fast as before, and the contended case is already slow. I annotated the source to print out how many blocking/nonblocking locks it’s actually doing. Inevitably, if there’s contention, it will end up dividing down to a blocking lock, so log(numchains) locks before doing a blocking lock.
|Processes||Blocking locks||Nonblocking locks||Seconds|
Sure, that’s a lot of locking when we’re competing with 5000 processes, but it’s less the naive one per chain, and it’s clear that it’s not the cause of the slowdown (we’re doing fewer locks per second than the 5 processes case).
And anyway, locking the entire database cannot be a speed-critical operation. Indeed, after the evidence here, I followed Volker’s suggestion to simply replace tdb_lockall() with the new implementation.
I pulled Bob Jenkins’ superlative lookup3.c into ccan some time ago, and with the work on TDB2 (aka 64-bit TDB) I wondered if there was a 64-bit variant available. I didn’t look hard enough: I should have checked his top-level page and seen lookup8.c before wasting his time with an email :(
I did note in my mail that since lookup3.c can return a pair of 32 bit numbers, I could combine those to get a 64 bit hash “but I wondered if it was possible to do better…”
His one-line reply was that he was working on a 128-bit hash now. Respectful and informative.
The gold standard test of online manners is how you respond to a dumb random email. A truly gracious gentleman/lady will respond as if you are far smarter far than you appear. Thanks Bob. And 128-bit; I’m a little stunned…
My philosophy for Free/Open Source Software comes down to this: that others can take what I do and do something awesome with it. Since I don’t know what you’ll need, I hand you every capability I have to get you started. Others granted me that very power to get where I am, so I seek to increment that.
It’s not always simple to do: sometimes you build it, and nobody comes. My experience is that clear code, convenient install, documentation and useful functionality all help. An encouraging welcome helps even more, as does getting the software out in front of people. It’s all about avoiding or lowering barriers.
We accept some barriers to modification which are reasonable: if you modify the software, I don’t have to support it. If I make you sign a support deal which says I won’t support the software (even unmodified versions) if you modify or redistribute the software, that’s not reasonable. If you get fired for publishing pre-release modified copyleft source code, that’s reasonable. If you get fired for publishing post-release, that’s not.
The hardware and electricity costs are barriers, but they’re undiscriminating and reasonable, so we accept them. Even the GPL explicitly allows you to charge for costs to make a copy. The software cost of the system is also explicitly allowed as a barrier. The software costs of the build environment are also accepted barriers (though historic for most of us): again even the GPL doesn’t require your software to be buildable with freely available tools.
As this shows, your choice of licensing is among your arsenal in keeping barriers low for co-hackers (who are not necessarily contributors!). I believe copyright gives too much power to the copyright holder, but as copyright is so hard to unmake, I favor the GPL. It tries to use legal power to meet the aims in the first paragraph: to force you to hand onwards all the pieces I handed to you. It’s also well understood by people and that common understanding gels a community.
Yet, as I alluded at the top, there are a realm of barriers which licenses don’t even try to address: the code could be an unmodifiable tangle, the documentation awful, the installation awkward, or the trademarks invasive. A license can’t make coders welcome newcomers, be friendly to upstream, responsive to bugs, write useful code or speak english.
The spectrum of barriers goes roughly from “that’s cool” through “I’m not so comfortable with that” to “that’s not Free/Open Source”. It’s entirely up to your definition of reasonableness; only in the simplest cases will that be the same point at which the license is violated, even if that license is explicitly drafted to defend that freeness!
So, don’t start with an analysis of license clauses. Start with “is that OK?”. Is there a fundamental or only a cosmetic problem? If it’s not OK, ask “does it matter?”. Is it effecting many people, is it setting a bad example, is it harming the project’s future, is it causing consternation among existing developers? If it is, then it’s time to look at the license to see if you can do anything. Remember that the license is merely a piece of text. It can’t stop anything, it can only give you leverage to do so. It certainly can’t make using the law easy or convenient, or even worthwhile pursuing.
To close, I will leave the last word to Kim Weatherall, who once told me: “if you’re spending your time on legal issues, you’re doing it wrong”.
I enjoyed Levitt & Dubner’s “Freakonomics”, and picked up the followup “Superfreakonomis” recently at an airport. The last chapter, however, was astonishing. The entire chapter was devoted to a glowing advertisement for Intellectual Ventures, pointing out that they own 20,000 patents “more than all but a few dozen companies in the world”, but of course “there is little hard evidence” that they are patent trolls.
But this bunch of wacky genius billionaires have solved global warming (much of which they dispute anyway) and can control malaria and prevent hurricanes from forming. Unlike the rest of the book which covers analysis of well-known facts and disputes them with insightful economic research, this chapter is so breathless and gushy that it makes me question the rest of the author’s work.
I first came across Intellectual Ventures when The Economist reversed their 100-year opposition to patents, and the only reason I could find was a similarly cheerleading piece about this company. (I had naively expected new research revealing some net positive of patents, or some such revelation).
Side note: when a respected information source covers something where you have on-the-ground experience, the result is often to make you wonder how much fecal matter you’ve swallowed in areas outside your own expertise.
So, what is IV actually doing? Buying up loads of patents and licensing them to companies who calculate it’s not worth the fight is patent trolling 101. Yet the scale they’re operating on puts them on new ground, and opens new opportunities. It seems obvious to get corporate investors on board by promising them immunity from patent claims. With enough patents you stop trying to license them one-by-one and just tax each industry at some non-negotiable rate. No doubt they have more tricks I haven’t even thought of, but these potential devices really do make them a new breed of Super Trolls.
Their efforts to actually attain their own patents could simply be more of the same, but it’s also a relatively cheap but clever PR exercise (as shown by their media treatment). This will help them when (legislative?) efforts are made to shut down patent trolls. I’m fairly confident that they’ll simply license rather than implement anything themselves; actually producing things requires much more work, and simply exposes you to others’ patents.
Without diving deeply into this, they seem to understand two things clearly:
- They learnt from Microsoft that government-enforced monopolies are worth billions. Microsoft had copyright on software, this is patents.
- Development is getting much cheaper, while patents are getting more valuable. Cheaper development is shown clearly by free software, open hardware and hackerspaces. Patent value increases as more of the world becomes a more profitable and enforceable patent target.
Now, I don’t really care if one company leeches off the others. But if they want to tax software, they have to attack free software otherwise people will switch to avoid their patent licensing costs. And if you don’t believe some useful pieces of free software could be effectively banned due to patent violations, you don’t think on the same scale as these guys.
I have been petitioning Jon Corbet to introduce a “Supporter” level subscription to LWN; I think given his failure to keep up with inflation and my need to experience conferences vicariously I feel deeply indebted to them.
That lead to me looking through LWN Quote of The Week; there’s no neat index of these things. And I’m not going to create one, but I did troll for my old quotes while procrastinating, and here’s my vanity list:
- January 21, 2004
- March 14, 2007
- March 28, 2007
- March 5, 2008
- May 7, 2008
- December 10, 2008
- January 28, 2009
- February 11, 2009
- April 22, 2009
- May 26, 2010
From the comments, it looks like I got the very first one. Wow. I’m sure there are missing ones (I was sure I got two weeks in a row once).
I’m behind quota for 2010 though; I was surprised my recent fight with Linus didn’t get in. I’ll try harder…