Code Reviewing: popt

I decided that every day I would review some code in ctdb. I never spend enough time reading code, and yet it's the only way to really get to know a project. And I decided to start easy: with the popt library in ctdb.

popt is the command-line parsing library included with the RPM source, and used by SAMBA. I vaguely recall Tridge telling me years ago how good it was. I started with the man page, and it is an excellent and useful library: it does enough to make the simple cases less code that getopt_long, yet allows flexibility for complex cases.

The idea is simple: similar to getopt_long, you create an array describing the possible options. Unlike getopt, however, simple integers and flags are handled automatically. So you set up the context with the commandline, then hand that context to poptGetNextOpt() to do the parsing. That keeps parsing args until it hits an error or you've marked an argument to return a value for special handling. The manual page is excellent and made me feel really comfortable with using the code.

Now, the code itself. Even before you notice the four-space tabs and bumpyCaps, you see the lint annotations and docbook-style comments cluttering the source. It does make the code annoying to read, breaking CCAN's Golden Rule. Typedefs of structs, typedefs to pointers, and several #ifdef __cplusplus complete my minor gripes.

The code is old and cross-platform: the header test for features we'd simply assume in a modern Linux. It has a simple set of bitmap macros (taken from pbm, judging from the name), a helper routine to find an executable in $PATH (using alloca!) . These are the kinds of things which would be in separate modules, were this in CCAN.

The definition of _free() to be a (redundantly-NULL-taking) free() which always returns NULL is used to effect throughout the code:

    defs = _free(defs);

Here is a trick I hadn't seen before to zero an onstack var, and really don't recommend:

poptDone done = memset(alloca(sizeof(*done)), 0, sizeof(*done));

The help-printing code is in a separate file, popthelp.c. It's actually about half 1/3 of the code! That's mainly because it takes pains to pretty-print, and it's done by manually tracking the column number through the routines (aka 'cursor'). These days we'd asprintf into a temp buffer and strlen() to figure out if we should start a new line and indent before printing this. Or even better, create a struct with the FILE * and the column number, and create a fprintf variant which updated the column number and figured out wrapping for us. Routines like maxArgWidth() which iterates the table(s) to figure how much to indent will still suck however: probably simplest to build all the strings in the help table in memory and then dump at the end.

I thought I found a buffer overrun, but a test program disproved it: the singleOptionHelp() uses its maxLeftCol (plus some extra stuff) to size a buffer. This works because maxLeftCol is previously calculated as the worst-case size of the argument help. Now, the code is unclear (AFAICT maxLeftCol must always be sufficient, so the extra bytes are wasted), but not buggy.

In summary, this is excellent code. Talloc would help it, as would modern printf usage (%.*s for example), but generally the code is in really good shape. I know that the popt code in RPM is a little updated, but I can't imagine that much has changed in such an obviously-stable codebase. The temptation to rewrite it is thus very low: who knows what the testsuite would miss?