What is randomness?

transparent-dice.png

The code project I'm currently working on requires generating numbers for simulating card games. This is a notoriously perilous task for programmers. It also brings up interesting (at least to some of us) questions about just what randomness is, a notoriously perilous task for philosophers.

It turns out (unsurprisingly) that the mathematician's perspective on randomness closely matches the gambler's perspective. “Random”, gamblers and mathematicians will tell you, is the opposite of “predictable”. That is, the extent to which we can predict anything about the outcome of some experiment ahead of time, those results are not random. Or even more toward the gambler's point of view, a random process is one on which we cannot—even in principle—make a bet in our favor.

This differs considerably from what most people informally think of as “random”. Most of us have an intuitive sense that random things are evenly distributed, which is true in the very very long run, but not true at all on the scales we generally experience things. This intuition is called the “gambler's fallacy”, because gamblers who bet expecting things to even out in the short run keep me employed. Just because that seven hasn't been rolled in a while, or we've just seen a run of black on the roulette wheel, that doesn't mean that upcoming rolls or spins will have any bias—any predictability—compared to previous rolls. Dice, it is often said, have no memory.

This leads to some unintuitive results: if we flipped a coin 10 times, and it landed heads all 10 times, we might rightly suspect that it wasn't a fair coin. But if we flipped it a million times, and there were not a single run of 10 heads in a row in the whole sequence, we would also reject that coin as non-random, because its distribution was too even. After billions of rolls of the dice, or billions of cards dealt, or billions of spins of the wheel, we would expect all the possible outcomes to be roughly—but not exactly—even. But in the short run, lopsided results are commonplace, and in fact failing to find such streaks is evidence of non-randomness. And here's an important clue: your lifetime is the short run. If it were true that things “evened out” in the short run, that would be a statistical bias that you could bet on and make money. People do bet on those feelings, but it's the casino who makes money, because they're betting on randomness.

There is one exception to note in casino games: blackjack. Cards don't have memory either, but the shoe from which cards are dealt does. If you've been watching 40 hands of blackjack and not seen a single face card, you can be sure that the cards remaining in that shoe, until it is reshuffled, are overly rich with face cards, and change your bets accordingly. The fact that you will see more face cards in the next few hands than you would expect from a fresh shoe is statistically predictable, and you can therefore make money from that fact, even if each individual card is still randomly chosen from those remaining. Card counting works because short-run clusters of events are predictable.

Randomness and science

People who misunderstand math and science get randomness wrong all the time. You will hear a silly argument from creationists that creating life from random processes is like a tornado in a junkyard randomly assembling a 747. The mistake here (well, one among many) is that they think a process that merely contains some random element is therefore a random process. Yes, random mutation is an important element in evolution, but less important than the process of natural selection, which is not random at all. Imagine, for example, your parents, and their parents, and their parents, going back thousands of generations. That's a lot of people. Now ask: how many of those people died at birth? How many died before puberty? How many were sterile? How many just didn't have the desire or opportunity to have children? These questions are easy: zero. 100% of those people, without exception, successfully reproduced. 100%. That's the exact opposite of random. That's why natural selection is such a powerful force. Even with random variation as input, its output is remarkably complex and and sometimes gives a magnificent illusion of design—and some things that clearly aren't designed.

500px-10_meter_air_rifle_target.svg.png

Even real scientists get randomness wrong. Doctors who find clusters of people with certain cancers, for example, often mistakenly jump to the conclusion that there's some environmental cause, when in fact a purely random distribution would inevitably produce such clusters. Only if the clusters are much worse than expected by randomness—as determined by a good mathematician—should we start looking for another cause. This is called the “Texas sharpshooter” fallacy, after a supposed shooter who fires shots randomly at the side of a barn, then walks over to the biggest clump of holes and draws a target there. Cancer clusters are just like our 10 heads in a row: if the numbers are big, we should expect them a certain number of times.

Even Dr. Steven Novella, host of the Skeptics Guide to the Universe podcast, has said on the air that the digits of pi are random. They are not. First of all, they are 100% predictable by calculating pi. But even outside of that, it is possible to find statistical biases in the digits of pi without actually calculating them out. For example, one can mathematically prove, say, that the 10 billionth digit of pi is a bit more likely to be a 7 than a 6 without calculating pi all the way out to 10 billion digits. Such statistical properties of the digits of pi are quite common, and it means that you could make a bet on one of those digits at fair odds and make money. If the digits were random, this would not be possible.

Programmers get randomness wrong all the time. This includes, unfortunately, the programmers who write the standard function libraries for most programming languages. When you learn to program, you will probably be given a programming exercise of some simple card or dice game, and be taught about the standard function in your language for producing random numbers. For a simple game run a few times, this is probably fine. But if you try to make real industrial-strength simulations of billions of hands or rolls, your built-in random function will probably fail in more than one way. It may be too evenly or too unevenly distributed, or it may repeat itself too soon.

So your language's built-in function is probably bad. And if you try to do it yourself, you'll probably make a worse one. So what should you do? Find an add-on library written by a serious math geek who you trust to get it right. And there's more: even if you have a perfect random number generator, you can use it the wrong way and still get bad results. Shuffling a deck of cards, for example, requires not only that you select cards at random with a good algorithm, but that you correctly rearrange the cards in such a way that each possible arrangement is equally likely, and this is easy to get wrong. There are even issues with the purpose for which the numbers are used: a good algorithm for choosing cards for a simulation might not be the best for choosing a cryptographic key, and vice versa. The well-known algorithms have fancy names: cryptographers use algorithms called RC4, Yarrow, and Fortuna. People writing simulations use algorithms called Mersenne Twister and CMWC.

The moral of this tale is short: if you think you know what randomness is, think again. Maybe consult a mathematician. If you're writing code that uses randomness, definitely consult a mathematician. It's much easier to get it wrong than you think. And test the code. I have as many lines of test code in my library to verify the random number generator as I have to generate the numbers. There's also a good program called “dieharder” that runs a suite of statistical tests on your generator to prove its quality (and which will, by the odd nature of the beast, occasionally—randomly—fail even when your code is perfect!)

Just to give you a glimpse of the level of complexity of the problem, this page shows the code from my OneJoker library that generates random numbers and shuffles cards, and does nothing else. Over 150 lines of code to ensure that the next card you get is, in fact, unpredictable.