CAT | Technical
It’s hard to describe the joy of coding, if you haven’t experienced it, but in this talk will try to capture some of it. Free/Open Source lets us remove the cruft which forecloses on the joy of coding: seize this chance!
I’ll talk about some of my favourite projects over the last 15 years of Free Software coding: what I did, how much fun it was and some surprising results which came from it. I’ll also discuss some hard lessons learned about joyless coding, so you can avoid it. There will also be a sneak peak from my upcoming linux.conf.au talk.
There’ll be some awesome code to delight us. And if you’re not a programmer we’ll take you our journey and show you the moments of brilliance which keep us coding.
Someone mentioned that you had to look at the source code if you wanted to hack your badge this year; I would have considered that cheating if I hadn’t known. (It’s been a few years since I last hacked my badge). But it helps if you look in the right place: http://lca2012.blogspot.com/2011/09/feeling-silly.html
Thanks to Tony Breeds for pointing me at that after I’d given up with the github upstream source…
So, ccanlint has accreted into a vital tool for me when writing standalone bits of code; it does various sanity checks (licensing, documentation, dependencies) and then runs the tests, offers to run the first failure under gdb, etc. With the TDB2 work, I just folded in the whole TDB1 code and hence its testsuite, which made it blow out from 46 to 71 tests. At this point, ccanlint takes over ten minutes!
This is for two reasons: firstly because ccanlint runs everything serially, and secondly because ccanlint runs each test four times: once to see if it passes, once to get coverage, once under valgrind, and once with all the platform features it tests turned off (eg. HAVE_MMAP). I balked at running the reduced-feature variant under valgrind, though ideally I’d do that too.
Before going parallel, I thought I should cut down the compile/run cycles. A bit of measurement gives some interesting results (on the initial TDB2 with 46 tests):
- Compiling the tests takes 24 seconds.
- Running the tests takes 12 seconds.
- Compiling the tests with coverage support takes 32 seconds.
- Running the tests with coverage support takes 32 seconds.
- Running the tests under valgrind takes 204 seconds (17x slowdown)
- Running the tests with coverage under valgrind takes 326 seconds.
It’s no surprise that valgrind is the slowest step, but I was surprised that compiling is slower than running the tests. This is because CCAN “run” tests actually #include the entire module source so they can do invasive testing.
So the simple approach of compiling up once, with -fprofile-arcs -ftest-coverage, and running that under valgrind to get everything in one go is much slower (from 325 up to 407 seconds!). The only win is to skip running the tests without valgrind, shaving 11 seconds off (about 2%).
One easy thing to do would be to compile with optimization to speed the tests up. Valgrind documentation (and my testing) confirms that using “-O” doesn’t effect the results on any CCAN module, so that should make it run faster, for very little effort. When I actually measured, total test time increases from 407 seconds to 495, because compiling with optimization is so slow. Here are the numbers:
- Compiling the tests with optimization (-O/-O2/-O3) takes 54/77/130 seconds.
- Running the tests with optimization takes 11/11/11 seconds.
- Running the tests under valgrind with optimization takes 201/208/208 seconds
So no joy there. Time to go and fix up my tests to run faster, and make ccanlint run (and compile!) them in parallel…
So, Alex scoured through wedding photographers, we chose one, met them, got the contract… and it stipulates that they own the copyright, and will license the images to us “for personal use”. So you pay over $3,000 and don’t own the images at the end (without a contract, you would). That means no Wikipedia of course, but also no Facebook; they’re definitely a commercial organization. No blogs with ads. In the unlikely event that Alex or I change careers and want to use a shot for promotional materials, and the photographer has died, gone out of business or moved overseas, we’re out of luck even if we’re prepared to pay for it.
The usual answer (as always with copyright) is to ignore it and lie when asked. But despite my resolution a few years ago to care less about copyright, this sticks in my craw. So I asked: it’s another $1,000 for me to own the copyright. I then started emailing other photographers, and that seems about standard. But why? Ignoring the obvious price-differentiation for professional vs amateur clients, photographers are in a similar bind to me: they want to use the images for promotion, say, in a collage in a wedding magazine. And presumably, the magazine insists they own the copyright. Since the photographers I emailed had varying levels of understanding of copyright, I can totally understand that simplification.
CCAN is supposed to be about the code, so I’ve avoided the standard GPL boilerplate comment at the top of each source file. I reluctantly include a symlink to the full license text in each directory now, since lawyers approached me to clarify the single “License:” line in _info. A useful discussion on the samba-technical mailing list has reinforced my view that it’s marginal clutter, but most CCAN modules now have a one-line courtesy comment such as “/* Licensed under LGPLv2.1+ – see LICENSE file for details */” at the top of each .c and .h file.
Please make a conscious choice here: if license enforcement is a high priority for your project you probably want copyright assignments, license boilerplates and click-through agreements for everyone who downloads your source code. But if you’re spending significant time or effort on legal issues for your little coding project, you’re probably doing it wrong…
(ccanlint now scans for common license boilerplates, as well as those comments; this means we can also detect use of incompatible licenses inside modules, or dependent modules. The former test noticed that I’d labelled the md4 module as LGPL, yet it’s actually GPL. The latter spotted that ccan/likely (LGPL) depends on ccan/htable (GPL): legal (the whole thing is actually GPL), but misleading, as Michael Adam noted). Automating this stuff is a clear win for a project like CCAN. I also re-licensed a bunch of useful-but-trivial modules from LGPL to public domain, as I want the BSD modules to use them).
I wanted a scalable version of this poster: so I offered 1BTC on the bitcoin forum and someone produced a version I can print and put on my wall in my home office.
I like bitcoins. A simple open source client, a well-run developer community, clever algorithms, decentralized assurance model, and of course near-zero transaction fees. For all the economic arguments (some of which sound like early anti-Wikipedia arguments, though I hesitate to argue by analogy), when I first used it to tip a website, I fell in love. It took me about 3 minutes to transfer $50 from my bank account to a stranger’s via my bank’s website, with 2 days latency. It took me about 5 seconds to send 0.1BTC to a stranger, with about 10 minute latency. It was a revelation.
But, while bitcoins rock, volatility doesn’t. It’s currently a bit hard to get some bitcoins, and the price has rocketed by speculation. It’s not just a cute FOSS project, it’s becoming a real market, and surely those piling in now are susceptible to hacking, scams and the inevitable hiccups that go with any project ramping up. It’s all going to come crashing down to earth.
Good. My hope is that after the GBC the speculators will move on. Volatility will settle. The boring work of accreting trust in this new tool will continue. I fervently hope that it we will appreciate its true utility once the dust has settled and the Man Loses A Million Dollars and House in Virtual Currency headlines have faded.
[Yes, you can tip me in bitcoins! I think it's a good habit, fun to do, and better than ads as I get a warm fuzzy feeling of appreciation. I'll be using any tips I get to tip other sites which take BTC].
This was passed on to me by Ben Elliston, ex-gcc hacker and good guy. Amusing in context, but the corollary is that working on free software means you’ll encounter such people. You may have to work with them. You may have to argue with them (and they may be right).
Quite some time ago I was horrified by the private behaviour of a hacker I deeply respected: malicious, hypocritical stuff. And it caused an internal crisis for me: I thought we were all striving together to make the world a better place. Here are the results I finally derived:
- Being a great hacker does not imbue moral or ethical characteristics.
- Being a great coder doesn’t mean you’re not a crackpot.
- Working on a great project doesn’t mean you share my motivations about it.
This wasn’t obvious to me, and it seems it’s not obvious to others. A-list actors endorsing Scientology doesn’t make it a good idea. Great FOSS political work was done by a certain obnoxious LWN-haunting nutball. Julian Assange may or may not be guilty of crimes in Sweden. Many of my kernel coworkers believed that GPLv3 was somehow a radical change from GPLv2. Some sweet code has been written by gun nuts, lechers, holocaust deniers and (in at least once case) someone who believes that fasting will cure cancer.
In any walk of life you have to work with all kinds; having to do so in my dream job as FOSS hacker was a hard lesson for me. It’s great to work with people whose skills you respect, but don’t expect to like them all.
I was delighted that Jon Corbet pinged me to say he was finally implementing a supporter option for LWN. It’s been about 12 months since I started asking about it, and 6 since I started asking publicly. When it finally arrived, in classical FOSS brand-suicide style, it was named the “Maniacal supporter” option. I don’t think Jon believed anyone would actually pay more “for nothing”, but curiosity finally won out.
But he’s wrong: people want the consistent commentry and in-depth analysis that only dedicated experts like Jon can provide. And we know that if they don’t get enough money, they’ll have to stop writing and take day jobs; this is not some abstract charity. I want Jon to be comfortable and LWN financially secure and able to concentrate on what they do best, which seems to be a rare skill in our community. This is a start in that direction; I welcome your suggestions on what to do next…
Jokes aside, I don’t prepare my conference talks the night before. I took a week off of work to prepare my linux.conf.au talk this year (two weeks before the conference, and I still spent a couple of work days in the week after completing it). That kind of spontaneity takes preparation!
Here’s a rough calculator of preparation time for an LCA-style talk. Make sure you finish at least a week before, to allow slippage!
Preparation Time for Standard Talk (~45 minutes)
If you have given it before:
- If you were happy with it and not changing it: 15 minutes to re-check, change conference name and date.
- If you were happy with it and changing it slightly: +1 hour for a complete run-through.
- If you were a little unhappy with it, but content will not change: +5 hours for reviewing previous video and googling for feedback and taking notes, then running through changed sections twice and complete run-through once.
- If you’re not the world expert on what you’re talking about, allow at least a week of research time.
- If the topic is vague allow at least a month of mulling time, where the topic sits in your brain. For longer periods I recommend jotting down your ideas. (I did this for an entire year before my OLS keynote, and I knitted a theme from the contents of that text file full of thoughts and examples).
- One hour to a day to plan your talk structure: what are your main points, what’s the extra magic?
Writing the talk:
- 10 minutes per basic slide. Usually I’d expect 25 slides, so say 4 hours.
- 30 minutes per diagram (five minutes of this will be trying to figure out if you really need a diagram: you probably do!). I’d expect five to ten diagrams, so say 5 hours.
- Five hours per demo. Not just setting it up in the first place, but making it robust and repeatable and practicing switching to and from your presentation adds time.
- Two hours per run-through (since you tend to stop and mod the first few times). You’ll want at least one more of these than you have time for, but I’d expect 8 hours for this.
- Using new software: +4 hours if you’re on your own, +1 hour if you have an expert available.
- Any project work where you have to document the steps for your talk: double your normal project time for the overhead, to ensure it’s comprehensible and maximally useful to the audience (vs. works).
- Any new presentation technique you haven’t used before, add 4 hours for two additional run-throughs.
Preparation Time for A Tutorial (~1.5 hours)
Similar calculations, but you’ll have many more demos so it’ll be more than twice as long. The real killer is that you have to practice timings with real people who are similar to your target audience. This means in addition to everything else, you’ll want to give it for some local group at least twice, because there’s no way you can know what the timing will be like until you’ve done that. Say +2 hours to organize the practice sessions, and +6 hours to run them (this includes transport, setup, testing and time overruns for each one).
Testing time happens standing in the venue, at the podium with your setup ready to go. I allow 5 minutes for video setup. If you’ve not presented on the laptop before, +15 minutes. If you’re not always running 1024×768, +10 minutes. If you want audio, +5 minutes. If you have a video to show, +5 minutes. If you have an interactive demo, +5 minutes to find a practice volunteer, +20 minutes to run through the demo with them.
In general, allow half your testing time the day before (ie. you’ll need to access the venue), the rest in the space before your talk.
So, this gives a preparation time for my LCA 2011 talk as:
- 1 day planning.
- 6 hours for 35 basic slides.
- 2 hours for 4 diagrams.
- 15 hours for 3 demos.
- 8 hours for run-throughs.
- 4 hours for messing with svgslides, even though I didn’t really use it in the end.
- 3 days for coding up the example project, and documenting that code.
- 4 hours for additional run-throughs because I hadn’t presented using a side-bar and emacs before.
Giving a total time of of 71 hours (assuming 8 hour days). That’s probably about right. And the required 30+ minutes of testing time explains why I didn’t end up having people telnet into my laptop for the demos; if I’d tested that the day before, we might have been able to organize something.