Sources of Randomness for Userspace

I've been thinking about a new CCAN module for getting a random seed. Clearly, /dev/urandom is your friend here: on Ubuntu and other distributions it's saved and restored across reboots, but traditionally server systems have lacked sources of entropy, so it's worth thinking about other sources of randomness. Assume for a moment that we mix them well, so any non-randomness is irrelevant.

There are three obvious classes of randomness: things about the particular machine we're on, things about the particular boot of the machine we're on, and things which will vary every time we ask.

The Machine We're On

Of course, much of this is guessable if someone has physical access to the box or knows something about the vendor or the owner, but it might be worth seeding this into /dev/urandom at install time.

On Linux, we can look in /proc/cpuinfo for some sources of machine info: for the 13 x86 machines my friends on IRC had in easy reach, we get three distinct values for cpu cores, three for siblings, two for cpu family, eight for model, six for cache size, and twelve for cpu MHz. These values are obviously somewhat correlated, but it's a fair guess that we can get 8 bits here.

Ethernet addresses are unique, so I think it's fair to say there's at least another 8 bits of entropy there, though often devices have consecutive numbers if they're from the same vendor, so this doesn't just multiply by number of NICs.

The amount of RAM in the machine is worth another two bits, and the other kinds of devices eg. trolling /sys/devices, which can be expected to give another few bits, even in machines which have fairly standard hardware settings like laptops. Alternately, we could get this information indirectly by looking at /proc/modules.

Installed software gives a maximum three bits, since we can assume a recent version of a mainstream distribution. Package listings can also be fairly standard, but most people install some extra things so we might assume a few more bits here. Ubuntu systems ask for your name to base the system name on, so there might be a few bits there (though my laptop is predictably "rusty-x201").

So, let's have a guess at 8 + 7 + 2 + 3 + 3 + 2 + 2, ie. 27 bits from the machine configuration itself.

Information About This Boot

I created an upstart script to reboot (and had to hack grub.conf so it wouldn't set the timeout to -1 for next boot), and let it loop for a day: just under 2000 times in all. I eyeballed the graphs of each stat I gathered against each other, and there didn't seem to be any surprising correlations. /proc/uptime gives a fairly uniform range of uptime values within a range of 1 second, at least 6 bits there (every few dozen boots we get an fsck, which gives a different range of values, but the same amount of noise). /proc/loadavg is pretty constant, unfortunately. bogomips on CPU1 was fairly constant, but for the boot CPU it looks like a standard distribution within 1 bogomip, in increments of 0.01: say another 7 bits there.

So for each boot we can extract 13 bits from uptime and /proc/cpuinfo.

Things Which Change Every Time We Run

The pid of our process will change every time we're run, even when started at boot. My pid was fairly evenly divided on every value between 1220 and 1260, so there's five bits there. Unfortunately on both 64 and 32-bit Ubuntu, pids are restricted to 32768 by default.

We can get several more bits from simply timing the other randomness operations. Modern machines have so much going on that you can probably count on four or five bits of unpredictability over the time you gather these stats.

So another 9 bits every time our process runs, even if it's run from a boot script or cron.

Conclusion

We can get about 50 bits of randomness without really trying too hard, which is fine for a random server on the internet facing a remote attacker without any inside knowledge, but only about five of these bits (from the process' own timing) would be unknown to an attacker who has access to the box itself. So /dev/urandom is still very useful.

On a related note, Paul McKenney pointed me to a paper (abstract, presentation, paper) indicating that even disabling interrupts and running a few instructions gives an unpredictable value in the TSC, and inserting a usleep can make quite a good random number generator. So if you have access to a high-speed, high-precision timing method, this may itself be sufficient.