Rusty Russell's Coding Blog | Stealing From Smart People

Feb/11

3

Summary of “Advanced C Coding For Fun!”

Perhaps there was too much fun, and not enough advanced C coding, as one attendee implied.  My original intent is to walk through a real implementation in the order I coded it, warts and all, but over 50% got cut for time.  After all, it took me 15 minutes in my BoF session just to run through the implementation of ccan/foreach.  (Hi to the three people who attended!).

So I ended up doing a fair bit of waving at other code (yes, mainly in CCAN: if I have a useful trick, I tend to put it there).  Here’s the bullet-point version of my talk with links:

  • CCAN is a CPAN-wannabe project for snippets of C code.
  • Your headers should be a readable and complete reference on your API.
  • Code documentation should be human readable and machine processable (eg. kerneldoc), but extracting it is a waste of time.  See above.
  • Your headers should contain example code, and this should be compile tested and even executed (ccanlint does this).
  • Perl’s TAP (Test Anything Protocol) has a C implementation which is easy to use.
  • You can write a better ARRAY_SIZE(arr) macro than “sizeof(arr)/sizeof((arr)[0])”, using gcc extensions to warn if the argument is actually a pointer, not an array.
  • I got bitten by strcmp()’s usually-wrong return value after coding in C for ten years.  I suggest defining a streq() macro.
  • It is possible, though quite difficult, to implement a fixed-values iterator macro, aka. foreach.  It’s even efficient if you have C99.
  • Making functions return false rather than exit, even if the caller can’t really handle the failure, makes for easier testing.
  • Making your functions use errno is a bonus, though its semantic limitations are definitely a two-edged sword.
  • A common mistake is to call close, fclose, unlink or free in error paths, not realizing that they can alter errno even if they succeed.
  • Never think to write malloc-fail-proof code without testing it thoroughly, otherwise you haven’t written malloc-fail-proof code.
  • You can test such “never-happen” failure paths automatically by forking; make sure you give a nice way to get a debugger to the fail point though, and terminate failing tests as early as possible.
  • There are libraries to make option parsing easier than getopt; popt and ccan/opt are two.
  • You can use macros to provide typesafe callbacks rather than forcing callbacks to take void * and cast internally; the compiler will warn you if you change the type of the callback or callback parameter so they no longer match.
  • Do not rely on the user to provide zero’d terminators to tables: use a non-zero value so you’re much more likely to catch a missing terminator.
  • Use talloc for allocation.
  • Don’t return a void * as a handle, even if you have to make up a type.  Your callers’ code will be more typesafe that way.
  • Don’t use global variables in routines unless it’s clearly a global requirement: keep everything in the handle pointer.
  • Valgrind is awesome.  Valgrind with failtesting is invaluable for finding use-after-free and similar exit-path bugs.
  • Fixing a test doesn’t mean your program doesn’t suck.  I “fixed” a one-client-dies-while-another-is-talking-to-it by grabbing another client; that’s stupid, though my test now passes.
  • Don’t do anything in a signal hander; write to a nonblocking pipe and handle it in your event loop.
  • The best way to see why your program is getting larger over time is to use talloc_report() and see your allocation tree (you can use gdb if you need, a-la Carl Worth.
  • You might want to do something time-consuming like that in a child; remember to use _exit() in the child to avoid side-effects.
  • There are at least two tools which help you dump and restore C structures: genstruct and cdump (coming soon, it’s in the talk’s git tree for the moment).  Both are very limited, though cdump is still being developed.
  • You can use a dump/exec/restore pattern to live-upgrade processes; forking a child to test dump and restore is recommended here!
  • If your restore code is well-defined for restoring fields that weren’t dumped, you can make significant code modifications using this pattern.
  • You can use C as a scripting language with a little boilerplate.  Use “#if 0″ as the first line, followed by the code to recompile and exec, then “#else” followed by  the actual code.  Make it executable, and the shell will do the right thing.
  • You can use gdb to do just about anything to a running program; script it if you can’t afford to have it stopped for long.
  • The best hash algorithm to use is the Jenkins lookup3 hash (there’s a ccan/hash convenient wrapper too).
  • The best map/variable array algorithm to use is Judy arrays (much nicer with the ccan/jmap wrapper).

That was all I had room for; there was none for questions, and even the last two points were squished onto the final “Questions?” slide.

RSS Feed

21 Comments for Summary of “Advanced C Coding For Fun!”

Kent Overstreet | February 3, 2011 at 5:27 pm

Good foreach macros are my #1 reason for C99, I make heavy use of them for bcache.

A serialization library for C structs is damn intruiging. I’ve been thinking about implementing a truly extensible superblock for bcache (and other things!); to do it right you need to be able to represent a tree structure. I think it’d make a lot of sense to have a text format; the implementation could be a lot simpler.

So I’m wondering how hard it’d be to get a parser for something like that into the kernel – may as well do it once and do it right, and have a standard text format (ala YAML or XML) for the kernel.

Paul Bolle | February 3, 2011 at 7:32 pm

> You can use C as a scripting language with a little
> boilerplate. Use “#if 0″ as the first line, followed by the
> code to recompile and exec, then “#else” followed by the
> actual code. Make it executable, and the shell will do the
> right thing.

That section is intriguing, but left me rather puzzled. Would you have (a pointer to) a working example of this?

Author comment by rusty | February 3, 2011 at 7:36 pm

Hmm, I skimmed the bcache patch and couldn’t find a similar pattern: where should I look?

For real ‘foreach_ptr(p, “foo”, “bar”, “baz”)’ you really need to give gcc -std=c99 (or similar) to get local variables in for() scope. The workarounds against that get fairly ugly; I used them for my foreach module.

I’d be delighted to be wrong, but I suspect we’re talking about a slightly different thing?

Author comment by rusty | February 3, 2011 at 7:37 pm

elmarco | February 4, 2011 at 1:22 am

Was the talk recorded? Any chance it will appear on http://linuxconfau.blip.tv/ sometime?

Brendan Miller | February 4, 2011 at 5:33 am

I don’t understand how #if does the same thing as #! /bin/sh

Is this defined in a standard somewhere, or is it just something that linux randomly supports?

Paul Bolle | February 4, 2011 at 10:07 am

@rusty: Thanks

@brandon: “#if 0″ doesn’t do the same thing as “#! /bin/sh. The “#!” stuff is rather complicated (see: man bash(1), under COMMAND EXECUTION for an introduction).

Rusty’s hack turned out to be that this C source file is also a valid shell script from “#if 0″ to “#else” (which is valid, but never compiled, C “code”). The last shell command before “#else” is exec, so the rest of the file is never executed as a shell script.

That rest of the file. after “#else”, should be valid C. It looks like the shell commands before “#else” compile it and execute the output of that compilation.

Author comment by rusty | February 4, 2011 at 1:36 pm

The shell falls back to running it as a script when it fails to execve. It’s POSIX and it’s everywhere AFAICT.

The slightly neater ways are to use tcc (which has a –run argument for exactly this use case!) or a wrapper program which does the same thing.

Author comment by rusty | February 4, 2011 at 1:39 pm

Expecting it to be there already, but assume it’s coming soon…

Sam Watkins | February 4, 2011 at 1:55 pm

I like the idea of CCAN, but why use the GPL for code snippets?

If you would release the snippets as public domain, I would be pleased to use them (in other public domain code, and possibly in differently licensed projects). As it is, I might learn something by reading the snippets, but I won’t use the code directly; I would have to rewrite it. The GPL is limiting, it isn’t even compatible with all other copy-left licenses.

The GPL licensed CCAN code could not be used in excellent BSD or MIT licensed works, nor in public domain works, nor in commercial or custom proprietary software. The GPL has helped to get the open-source ball rolling, but I feel it’s now swung the other way, and it’s more of a problem than a solution. In a way it’s more insidious than proprietary software, because the software claims to be free and for developer freedom, but in fact strongly limits developer freedom compared to many other popular licenses.

The GPL takes away personal freedom of developers (to mix code from any source freely), and supposedly gives freedom to the code itself. But I don’t care if copies of my code are ‘enslaved’ in proprietary programs. Code does not have will, freedom or life of its own. I want to ensure the freedom of developers to use copies of my code in any open-source program, I don’t care whether all copies remain open-source.

Author comment by rusty | February 4, 2011 at 2:02 pm

License choice is up to the individual module author. I have personal license preferences, but that’s orthogonal and probably not a constructive conversation for this blog :)

Kent Overstreet | February 4, 2011 at 7:53 pm

Re: foreach. I hadn’t looked at your foreach when I commented, ours are solving different problems…

What I was trying to do was write a foreach specific to a data structure that hid the details of iteration. bio_for_each_segment() is a simpler version of my macros. Without C99 the caller has to pass you all the variables you need for your state, which gets ugly if your data structure is complicated enough (or if you’re doing something like hlist_for_each_entry_safe()).

Pádraig Brady | February 5, 2011 at 12:47 am

This is a nice idea!
It seems to overlap with gnulib a bit:
http://www.gnu.org/software/gnulib/MODULES.html

elmarco | February 7, 2011 at 12:09 pm

Sam Watkins | February 8, 2011 at 4:51 pm

Very interesting, I wasn’t there so I’ll watch the video later. Your dot points are concise and excellent, thanks very much.

CCAN is a very good idea. I think many people have had the same idea so it might be a fragmented effort. It would be hard to get lots of C developers to use a single repository. But multiple CCAN-like sites can share their code anyway.

Re: header files and code doc, I like to generate headers automatically (DRY). 1/2 as many files to maintain, more productivity. API doc could be extracted from the .c files and included in the headers. I like the idea to include example / test code in .c files; the linker can remove it if necessary.

I’ve implemented a variety of foreach() style iterators too.

For error handling, I make my function call an error handling routine on error. This handler normally exits with an error message, but I can tell it to do something else such a longjmp, log the error and continue, ignore it, or whatever else. I’ve written wrappers for all the standard library / system calls that I use so that they call this error function. This is like perl’s ‘autodie’ module, C++ / Java exceptions, etc., perhaps a little more flexible. There is an overhead with doing this, but errors should be rare events, so the overhead does not matter. (EOF is not an error!)

Regarding option parsing, I like to use a ‘context free option syntax’: the connection between options and option arguments (nil, single or lists) should be determined by the general syntax, without having to know what the particular options are or what they do.

Talloc looks good, I’ll have to look into it!

I was disappointed that valgrind and efence don’t seem to help with debugging static allocation overruns (global or static vars). I resorted to gdb and it took quite a while to find the bug. Do you know of a tool that helps with this?

I wrote a struct dump / restore tool too. It’s more basic than genstruct, and does not follow pointers. It produces debian-package-list-style text files (much like lists of http/email headers joined with a blank line). I am using it to load a Tagalog dictionary, it seems to be quite fast which is nice.

For C scripting, you can also use a single line to do this, e.g. I used this in my ‘xdark’ program (sam.ai.ki/xdark.c):

// 2>/dev/null; cc -o xdark -Wall xdark.c -L/usr/X11R6/lib -lm -lX11 -lXxf86vm && exec ./xdark “$@”; exit

The // comment results in an error for the shell, which is hidden. A separate ‘compiler / runner’ script could also perhaps be used, which would reduce the amount of junk you have to put at the start of the C program.

The Jenkins hash / Judy arrays look interesting. The performance is impressive, but the amount of code puts me off a bit. I would use them if the performance of a simpler hash / array was inadequate for the task, but I haven’t experienced such a bottleneck in my own programs.

Author comment by rusty | February 9, 2011 at 10:42 am

> It would be hard to get lots of C developers to use a single repository.

Yes, there are already a few out there, though none explicitly modelled after CPAN. I feel that whoever makes it easiest to contribute will attract a natural monopoly, so many of my efforts will be in making that easier from now on.

> Re: header files and code doc, I like to generate headers automatically (DRY).

I avoid auto-gen headers for external interfaces, simply because I believe your header is the place your code’s shopfront: the separation makes me think about how to present the interface.

> For error handling, I make my function call an error handling routine on error.

Agreed, pluggable error handlers are a win for most code IMHO.

> Regarding option parsing, I like to use a ‘context free option syntax’

I think I need an example to understand how you’d do this?

> I was disappointed that valgrind and efence don’t seem to help with debugging static allocation

Indeed. I’d love to see an enhanced valgrind to use debug info to figure out when a ptr crosses object boundaries along the stack.

> // 2>/dev/null; cc -o xdark -Wall xdark.c -L/usr/X11R6/lib -lm -lX11 -lXxf86vm && exec ./xdark “$@”; exit

Oh, kudos for the great trick! Thanks! tcc -run does this for you automatically, and as you mentioned it’s easy to write a wrapper to do it.

> The Jenkins hash / Judy arrays look interesting. The performance is impressive, but the amount of code puts me off a bit.

Yeah, it’s definitely best as a library. I’m a bit torn in my ccan modules between wanting to use it and not wanting to depend on it.

Anyway, if you want to mail some useful code snippets through to the ccan mailing list, I’d be happy to take them!

Kevin Bowling | February 9, 2011 at 12:13 pm

CCAN would be far more interesting if it were mostly BSD/MIT licensed. Even LGPL is a PITA for anything beyond .h files at this core library level. But GPL? Forget it.

Author comment by rusty | February 10, 2011 at 11:04 am

Cool. Please submit some BSD licensed code then!

Joey Adams | February 18, 2011 at 5:09 pm

> Talk video recording: http://linuxconfau.blip.tv/file/4724792/

Thanks for posting this! I wanted to watch it, but the first several minutes of it were not streamed to the web (unless it only worked for everyone else).

I’m also in the BSD/MIT camp. Maybe a license war will encourage people to contribute :-)

> Your headers should be a readable and complete reference on your API.

I like the way PostgreSQL documents code internally: at the definition, rather than declaration. This works well with cscope, as “Find this global definition:” will take you right to the definition and documentation. This keeps headers terse and synoptic, and the documentation close to the implementation so it’s easier to verify.

> Regarding option parsing, I like to use a ‘context free option syntax’

I recently thought about implementing an option parsing library using a description syntax inspired by documenting conventions, and in particular, the git man pages:


git stash list []
git stash show []
git stash drop [-q|--quiet] []
git stash ( pop | apply ) [--index] [-q|--quiet] []
git stash branch []
git stash [save [--patch] [-k|--[no-]keep-index] [-q|--quiet] []]
git stash clear
git stash create

On a slightly-related note, I can think of a case where the option syntax would not be regular (that is, it would require a stack to parse):

find -type f \( -name 'foo' -o -name 'bar' \)

Sam Watkins | March 7, 2011 at 11:23 am

>> Regarding option parsing, I like to use a ‘context free option syntax’

> I think I need an example to understand how you’d do this?

Well, for options without arguments, they are as normal, e.g. –foo or -f. If an option takes an argument, it must be of the form –opt= arg (or –opt=arg). If an option takes several (0,1 or many) args, it must be of the form –opt: a0 a1 (or –opt:a0 a1)

Short options are also allowed, e.g. -o: foo bar

The list is terminated by the next option, or by the “–” end-options marker.

If this syntax is followed, options can be parsed without needing to provide any extra info about what they mean, or how many arguments they might take.

I think this idea of a sensible options syntax is boring but useful!

Sam Watkins | March 7, 2011 at 11:26 am

Also, every program that can take a list of arguments should work consistently when given 0 arguments! unlike almost all the standard unix tools :/

e.g.: grep — ‘foo’ `find . -name ‘*.c’`

This works all very well until no *.c files are found, then it greps stdin!

Leave a comment!

«

»

Find it!

Theme Design by devolux.org

Tag Cloud