ULTIMATE GUIDE TO RANDOMNESS
A collection of brief topics we think you should know about
RANDOM NUMBER GENERATORS
Where do random numbers come from?
Good question! Most people don’t know how random numbers are generated; they are largely taken for granted. In practice, most random numbers are generated by the operating system. All applications running on a given system tend to get random numbers from the same place. It doesn’t matter whether an application needs just a few bytes or many megabytes of either perfect randomness or ‘nearly’ randomness; it all comes from the same place. Just like in your own home, everything that uses water generally gets it from the same water supply. While simple, this approach forces critical security applications to compete with more mundane tasks for randomness, which can be a serious issue if randomness becomes a scarce resource. Learn more about random number generation in Linux.
Types of Random number generators
Most RNGs fall into one of two broad categories. The first category includes RNGs that take a relatively small amount of random data, often only a few hundred bits and use an algorithm to extrapolate that randomness to create a much larger volume of ‘random’ numbers. These software based RNGs are known as Pseudo-Random Number Generators (PRNGs) or Deterministic Random Bit Generators (DRBGs). The other category is known as True Random Number Generators (TRNGs) or Non-deterministic Random Bit Generators (NRBGs) these RNGs can be software or hardware based and generate random numbers based entirely on freshly-made random data, no data extrapolation algorithms are used. Within this class of TRNGs there are two subclasses: those devices that monitor activity and events around them in order to extract randomness (e.g. from keyboard usage or network traffic etc.) and those devices that create their own randomness using a dedicated source of entropy. Learn more about the Whitewood Entropy Engine.
Pseudo random number generators (PRNGs)
PRNGs (also referred to as deterministic random bit generators or DRBGs) are an essential part of most cryptosystems. PRNGs are software programs that use well-known algorithms to extrapolate relatively small amounts of entropy (random ‘seeds’) into the larger amounts of ‘random’ data needed by applications. No matter how well a PRNG is designed, it cannot create entropy or randomness. If a PRNG is only seeded with low entropy seeds then the output of the PRNG, no matter how large the data set might be, will also have a low level of entropy. Dealing a pack of playing cards provides a useful analogy. The PRNG is equivalent to the process of dealing the cards whereas providing the random seed to the PRNG is equivalent to the process of shuffling the deck; the greater the number of cards being dealt, the greater the need to shuffle the pack. PRNGs may suffer from error-prone implementation or even intentional backdoors but even if they are built correctly, their core security depends entirely on the randomness of the seeding process. Learn more about creating good PRNG seeds across the datacenter.
Random number generators in Linux
Linux contains two pseudo random number generators (PRNGs), they are similar but differ in one important way. The more secure of the two is ‘dev/random’. Confidence in /dev/random comes from the fact that it will only provide random numbers if Linux believes that it has sufficient entropy to generate them securely. Even though /dev/random is based on a PRNG algorithm it can still deliver high-quality random numbers because it is re-seeded with fresh entropy every time a random number is requested. As a safety feature /dev/random will freeze and provide no output if Linux believes it has insufficient entropy to generate a random number. Unfortunately this blocking behavior can severely impact application performance. To overcome this issue, Linux provides an alternative PRNG called ‘dev/urandom’. The ‘u’ stands for ‘unblocking’. While /dev/random will freeze if Linux has insufficient entropy, /dev/urandom will supply ‘random’ numbers regardless of the amount of entropy available. The fact that /dev/urandom is always available makes it a popular, but potentially unwise choice for developers. Learn more about random number generation in Linux.
Randomness and entropy
Entropy is the statistical measure of disorder within a set of data. Data with the highest levels of entropy has the lowest levels of structure or correlation. For data to be truly random it has to have a high level of entropy, usually measured as a fraction of a bit-per-bit of data in the sample. An 8 bit number might contain 7.2 bits of effective entropy. But entropy is only part of the quest for randomness. The most demanding applications such as cryptography require random numbers that are not just statistically random but also unpredictable. For example, the digits of the number Pi (3.1415926535897932384… etc.) may appear random and contain statistical entropy but are easily predicted. Proving that data is random goes beyond measuring entropy. Proving randomness requires knowledge of how the data was originally generated. Randomness is something that needs to be architected into a system from the outset, not something that can be measured retrospectively. Learn more by reading our technical whitepaper on strengthening the crypto infrastructure.
We live in a world that is full of entropy but harnessing that entropy to create random numbers in computer systems is not so simple. Application developers have invented numerous ways to scavenge natural entropy from the physical world, even resorting to using video cameras, microphones and radio receivers to pick up cosmic radiation. Unfortunately in modern data centers most of these natural sources are simply not available. As an alternative it is possible to monitor activity within the host computer itself such as timing signals and jitter created by software process, the risk is that these might be manipulated by attackers and shared by multiple VMs. In some cases purpose built sources of entropy may be included in the hardware or provided as a peripheral such as a USB token or PCI card. The challenge is that entropy sourcing is wildly inconsistent across different platforms and environments and can be extremely difficult to monitor, not good news when trying to establish consistent security. Learn more about the Whitewood Entropy Engine.
Hardware entropy sources
Dedicated entropy sources overcome the challenge of capturing entropy from the local environment or generic activity on the host computer. Examples include USB tokens, hardware security modules (HSMs), CPU level features (e.g. Intel’s rdrand) and smart cards. These systems can be a powerful addition when generating true random numbers. The challenge is that specialist hardware doesn’t necessarily suit every environment. For example, USB tokens might be prohibited in corporate data centers and adding any sort of additional hardware might be impossible in public clouds or even hosted environments. Dedicated entropy sources are often essential when architecting high security systems and can therefore impose severe deployment constraints. Ideally it would be possible to access dedicated hardware entropy sources in a way that they are available to all applications irrespective of the local hardware or software environment. Learn more enterprise strategies for random number generation.
There are many ways to capture and generate entropy. Some systems capture events such mouse clicks and keystrokes or analyze images or sounds. Some measure timing jitter and sample electrical noise. But these signals and events are not perfectly random, they contain patterns, correlations and bias. They all require data processing to extract whatever true randomness may exist, which is itself an imperfect process. Quantum-based entropy sources exploit random behavior at the sub-atomic level. This behavior is fundamentally random, unpredictable by any attacker, even with unlimited resources. This resistance to attack is in stark contrast to other sources of entropy that face the risk of manipulation and subversion. For these reasons, many argue that quantum-derived entropy is the nearest you can get to perfect randomness and therefore the best source for a true random number generator. Learn more about quantum random number generation.
Why random numbers really matter
Random numbers are used throughout computer systems for many purposes; creating process IDs, shuffling data and adding texture to graphics are just a few examples. In most cases it doesn’t really matter if the random numbers aren’t truly random. But in other situations such as statistical modelling, gaming and security applications, random numbers need to be truly random. The most obvious case where randomness is critical is with cryptography. Random numbers are used to make keys and cryptographic keys need to be perfectly random. Any patterns within the key give the attacker clues and make it easier to crack. Making perfectly random numbers is hard, much harder than you would expect and checking that it is working correctly can be the difference between crypto that is safe and crypto that isn’t safe. Learn more about the impact on crypto security in this blog.
When is randomness most critical?
When encrypting data the only thing that separates the attacker from your data is the quality of your keys. Any applications that incorporate SSL/TLS for protecting internet or VPN traffic and those that protect data at rest though file or disk encryption should have access to true random numbers. Other applications that involve the generation of long-term keys should always be a priority since the impact of a compromise and the cost of replacing keys can be considerable. PKI-based applications that involve the issuance of credentials and creation of digital signatures will fall into this category. Finally, there are specialist applications such as payments, gaming and cryptocurrencies, many of which will be regulated and subject to compliance requirements. Learn more about deployment scenarios for true random number generators.
What if things go wrong?
Most random numbers are made by software. The problem is that software doesn’t act randomly. Software random number generator algorithms rely on random ‘seeds’ to create numbers that are sufficiently random. This raises obvious questions such as how do you prove the seeds are truly random? Do they come from a reliable source? How often does the software need to be ‘reseeded’ in order to stay random? And perhaps most importantly, how can you tell if something goes wrong and your random numbers stop being random? Unfortunately the answer to all these questions is that it’s difficult or even impossible, which creates a problem when you face a security audit. Learn more about how random numbers are made.
Standards for random numbers generators
Many tests for measuring the statistical randomness of data have evolved over the years, some more stringent than others, but none are perfect and all fail to provide a complete assessment of the effective security of a random number generator. Proving randomness is so tricky that few formal standards exist; even well-established crypto product certifications such as FIPS 140 regard the sources of entropy to be out of scope. The National Institute of Standards and Technology (NIST) in the US has proposed a suite of three standards to cover both deterministic and non-deterministic or true random number generators: SP 800-90 A, B and C. The first of which, 800-90A covers PRNGs and has already been finalized; the others are expected to follow by the end of 2016 and will likely become part of certification testing soon after. Learn more by reading our blog on key generation.
Validation and compliance
Some operating system tools exist to track the generation and consumption of entropy but these tools are unreliable and are difficult to use on an operational basis. It is possible to measure the statistical randomness of keys but that only tells part of the story. The critical issue is assessing their unpredictability, something that is almost impossible to do without full knowledge and control of the entropy sources that are being used and quality of the entropy they gather. If you’re worried about generating weak random numbers then prevention is more likely to be successful than detection. Random number generators have to be architected correctly from the ground up and monitored in real time using health-checking functions. Learn more about measuring entropy and randomness.
Operations and infrastructure
Entropy generation has traditionally been considered a ‘local’ issue, something that is handled locally on each host machine. As systems become more distributed across virtualized environments and on consumer devices, the concern over consistency and security increases. In cloud or hosted environments there is often no control over the physical hardware or local environment, and remote delivery of entropy from a centralized source is likely to become essential – the concept of ‘bring your own entropy.’ Across private data centers and shared or hosted environments entropy services might one day be considered an essential ‘utility’ service. Entropy would be universally available in the same way that time and date services are delivered to servers and network appliances today. From a security point of view, entropy generation is too important to be left up to individual machines. Learn more about network delivery of true random numbers.
Virtualized applications and clouds
Generating random numbers relies on having access to a strong source of randomness. Software alone can not generate randomness since it is fundamentally deterministic. Randomness comes from the physical world where noise signals, user activity or natural events can be sampled and analyzed to create random data. Unfortunately the process of virtualization breaks the connection between applications and the real world. While there are many advantages to this abstraction in terms of scalability and flexibility when deploying applications, it creates a virtual firewall for randomness. Unfortunately there is very little randomness in the virtual world. Worse still, VMs that are replicated can contain copies of the same entropy and internal state which means the random numbers they generate may no longer independent until they are reseeded, which an issue when entropy is limited. Learn more about Whitewood’s approach for supplying randomness to the virtual world.
Ownership and controls
Even though it is a crucial activity in all cryptographic systems, random number generation rarely has any clear point of ownership. Most random number generators are buried deep in the operating system and the quality of entropy sources is almost always dependent on the hardware platform and local environment. Most RNGs are blind to the needs at the application level. Similarly, applications are often confined to containers or abstracted from the hardware by layers of virtualization which makes it impossible to validate the quality of the random numbers they receive. Attesting to the quality of random numbers requires knowledge of the entire IT stack – from physical environment, hardware, hypervisor, OS and application. Few security or operations professions have that level of visibility or control across a modern data center. This is cause for concern in many IT departments. Learn more about the challenge of controlling random number generation.