Sources of Randomness for Userspace

I’ve been thinking about a new CCAN module for getting a random seed.  Clearly, /dev/urandom is your friend here: on Ubuntu and other distributions it’s saved and restored across reboots, but traditionally server systems have lacked sources of entropy, so it’s worth thinking about other sources of randomness.  Assume for a moment that we mix them well, so any non-randomness is irrelevant.

There are three obvious classes of randomness: things about the particular machine we’re on, things about the particular boot of the machine we’re on, and things which will vary every time we ask.

The Machine We’re On

Of course, much of this is guessable if someone has physical access to the box or knows something about the vendor or the owner, but it might be worth seeding this into /dev/urandom at install time.

On Linux, we can look in /proc/cpuinfo for some sources of machine info: for the 13 x86 machines my friends on IRC had in easy reach, we get three distinct values for cpu cores, three for siblings, two for cpu family, eight for model, six for cache size, and twelve for cpu MHz.  These values are obviously somewhat correlated, but it’s a fair guess that we can get 8 bits here.

Ethernet addresses are unique, so I think it’s fair to say there’s at least another 8 bits of entropy there, though often devices have consecutive numbers if they’re from the same vendor, so this doesn’t just multiply by number of NICs.

The amount of RAM in the machine is worth another two bits, and the other kinds of devices eg. trolling /sys/devices, which can be expected to give another few bits, even in machines which have fairly standard hardware settings like laptops.  Alternately, we could get this information indirectly by looking at /proc/modules.

Installed software gives a maximum three bits, since we can assume a recent version of a mainstream distribution.  Package listings can also be fairly standard, but most people install some extra things so we might assume a few more bits here.  Ubuntu systems ask for your name to base the system name on, so there might be a few bits there (though my laptop is predictably “rusty-x201”).

So, let’s have a guess at 8 + 7 + 2 + 3 + 3 + 2 + 2, ie. 27 bits from the machine configuration itself.

Information About This Boot

I created an upstart script to reboot (and had to hack grub.conf so it wouldn’t set the timeout to -1 for next boot), and let it loop for a day: just under 2000 times in all. I eyeballed the graphs of each stat I gathered against each other, and there didn’t seem to be any surprising correlations.   /proc/uptime gives a fairly uniform range of uptime values within a range of 1 second, at least 6 bits there (every few dozen boots we get an fsck, which gives a different range of values, but the same amount of noise).  /proc/loadavg is pretty constant, unfortunately.  bogomips on CPU1 was fairly constant, but for the boot CPU it looks like a standard distribution within 1 bogomip, in increments of 0.01: say another 7 bits there.

So for each boot we can extract 13 bits from uptime and /proc/cpuinfo.

Things Which Change Every Time We Run

The pid of our process will change every time we’re run, even when started at boot.  My pid was fairly evenly divided on every value between 1220 and 1260, so there’s five bits there.  Unfortunately on both 64 and 32-bit Ubuntu, pids are restricted to 32768 by default.

We can get several more bits from simply timing the other randomness operations.  Modern machines have so much going on that you can probably count on four or five bits of unpredictability over the time you gather these stats.

So another 9 bits every time our process runs, even if it’s run from a boot script or cron.


We can get about 50 bits of randomness without really trying too hard, which is fine for a random server on the internet facing a remote attacker without any inside knowledge, but only about five of these bits (from the process’ own timing) would be unknown to an attacker who has access to the box itself.  So /dev/urandom is still very useful.

On a related note, Paul McKenney pointed me to a paper (abstract, presentation, paper) indicating that even disabling interrupts and running a few instructions gives an unpredictable value in the TSC, and inserting a usleep can make quite a good random number generator.  So if you have access to a high-speed, high-precision timing method, this may itself be sufficient.

18 replies on “Sources of Randomness for Userspace”

  1. Serious consumers of random bits should also consider pulling data from the internet, such as e.g. mixing the hashes of reddit headlines or tweets into the random pool.

    Or join a peer-to-peer network where randomness is gathered by every peer, mixed in, and shared on demand.

    Even ICMP ping response times can give your random pool a nice twist.

    The chance of manipulation is extremely small if you use multiple sources and adequate mixing/hashing. For example, if you hash reddit headlines, you should first prepend some random bytes from your pool or another source. This will make the result of the hash virtually impossible to guess. And hashing algorithms are _far_ from broken.

  2. I think the PID comes from /dev/random, you can’t feed it back to /dev/urandom and call it new.

  3. This is exactly why we (Dell engineers) upstreamed patches to rngd to let it grab random numbers from the TPM RNG and feed them into the kernel entropy pool. At least the TPM is good for something then…

  4. I hope you don’t plan to check any of these entropy sources in userspace, rather than just fixing the kernel’s /dev/urandom to use them if it doesn’t already.

    Any C program looking for a random seed should just use /dev/urandom, and any program needing cryptographic randomness should read all of its random numbers from /dev/random without using a userspace RNG at all.

    1. > fixing the kernel’s /dev/urandom

      You don’t even need to do that. You can just write them into /dev/urandom (even as non-root). That was what my comment about “it might be worth seeding this into /dev/urandom at install time.” meant.

  5. Maybe some more bits to be had from $ iwlist wlan0 scanning

    Even if the visible APs stay the same, things like last beacon time, signal strength and quality will vary.

    Perhaps server boards should come with ‘LavaRnd on a Chip’, an already enclosed CCD chip that the kernel can sample at will.

  6. This is the problem that HAVEGE ( tries to solve. We actually ran into the “server out of entropy” problem all the time before installing haveged, which caused long stalls during ssh logins and other problems.

    And, to quote Greg Maxwell: “I have to say “servers out of entropy” is the most first worldish problem I think I’ve ever had.”

  7. I’ve been using the haveged[1] daemon for a few months already. It feeds impressive amounts of allegedly high-quality entropy bits to /dev/random (not urandom), which in turn produces good random numbers at megabits-per-second speeds. So, I’ve been using /dev/random for everything that used to require /dev/urandom.

    I’m not very conscious (or well informed) regarding random number quality, though. :)

    [1] Haveged:

  8. What about when the system is in a VM?
    At that point, when we most need it, almost none of these things is truly random (especially to an attacker with “inside knowledge”)

  9. There’s a continuous source of entropy if a microphone is present. Ambient noise. Should be a kb/s of randomness available by hashing the audio stream.

    And if there’s no mic? Well, if you can crank up the gain enough, you’ll get electronic-interference noise which may be just as good, or thermal noise which is much better!

    Why don’t all computers have a thermal noise source built in these days?

  10. The use of the random numbers matters.

    For example, for the random number uses to initialise the TCP sequence number you don’t want to use sources an exterior agency can also detect: uptime, MAC address, packet arrival times, things from websites.

    You’ve also got to worry about covert channels: the ability to use observed behaviours (such as timing) to work back to information (such as the number of CPUs and RAM). For example, from the MAC address you can derive make and model, and thus likely contents of /proc/cpuinfo.

    Linux is a multiuser operating system, and information like process IDs are visible to all users. Again, if this matters depends on what you are using the random number for.

    /dev/random is more at risk from a gross overestimate of entropy (because the attacker knows a lot more than you credit them for or because things are less random then they “should” be) than /dev/urandom. Even so, papers proposing cryptographic attacks on the urandom PRNG have been disappointed about the quality of urandom when judged against other cryptographic PRNGs.

    It would be a worthwhile experiment to save the inputs to the entropy pool and then examine each for their inherent randomness. I did this about a decade ago and it was disappointing, but hardware has changed so much (PCI, interrupt coalescing, SSD, etc) that nothing can be drawn from that for today’s hardware.

    Scavenging across the operating system for any likely source of randomness is proof that we are desperate for hardware which provides random numbers. So it’s good to see work like Matt’s.

    Nigel: the noise from an overdriven microphone on a computer isn’t random. Have a listen: it’s correlated with traffic on nearby buses (which isn’t random, and which can be influenced by outside events). You also need to show care, as overdriving a ADC will lead to some of the lower bits “sticking” and so you won’t retreive as many bits as suggested in the part’s datasheet. For more secure uses your computer can’t have a microphone anyway, which is a shame as the noise of wind rushing past a server in a computer room is pretty close of the original definition of “white noise”.

    I realise that a lot of the above is paranoia bought on by actually having built serious cryptographic equipment. But my experience has been that “random numbers” is the second-simplest way into cracking a cryptographically-secured system (the easiest way being attacking — perhaps physically — the key distribution).

  11. I’m sure I read somewhere on LWN that someone had looked at the nanosecond offset from the millisecond that the kernel woke up on, and it was pretty consistently random within a range. Even if you only get one bit of randomness there, you’re generating roughly a thousand bits per second from that.

  12. I think my comment about using the microphone or audio channel was misunderstood (or alternatively that there’s something I don’t understand). I’m not claiming that the low-order bits are random. I’m claiming that the entire audio stream contains much entropy. If one took say five seconds of audio and calculated an appropriate hash of it (something like SHA256), I believe that you would have at least 256 good random bits.

    Obviously one would need a bit of validation, especially if one attempted to extend this to an audio channel without a microphone connected. It *might* be generating all-zero, or a very simple fixed binary pattern. A microphone listening to ambient noise is much better.
    If security considerations prohibit microphones, it’s not hard to dream up a small analog circuit containing a noise diode, a transistor, and a few other passive components plugged into a USB socket (for power) and the mic socket. WHY don’t they build that in on motherboards?

  13. dmidecode may give you quite a few more bits (of the type that is useless if the attacker has access to your machine … but probably quite valuable otherwise). Serial numbers and f/w version numbers of several components.

    Or … just wait and buy an Ivy Bridge cpu and use RDRAND

  14. I have to say that relying on PID for a server/bootup proccess is far from secure.

    Your distro might give you a random number every boot, which is kinda weird, but mine (Slackware) gives me the exact same PID every boot.

    Of course that will depend upon when the command was issued (what init script was it referred) or what distro you’re using but the entropy on that procedure is minimal, I would give it one bit, if not less.

    The same applies to cpu, while you can have a little entropy out of that for the first time you try it, all subsequent attempts will return the exact same value.

Comments are closed.