To read my most recent article about chaotic systems, click here.
The logistic map is the most basic recurrence formula exhibiting various levels of chaos depending on its parameter. It has been used in population demographics to model chaotic behavior. Here we explore this model in the context of randomness simulation, and revisit a bizarre non-periodic random number generator discovered 70 years ago, based on the logistic map equation. We then discuss flaws and strengths in widely used random number generators, as well as how to reverse-engineer such algorithms. Finally, we discuss quantum algorithms, as they are appropriate in our context.
Source for animated picture: click here
Logistic map
The logistic map is defined by the following recursion
X(k) = r X(k-1) (1 – X(k-1))
with one positive parameter r less or equal to 4. The starting value X(0) is called the seed, and must be in [0, 1]. The higher r, the more chaotic the behavior. At r = 3.56995… is the onset of chaos. At this point, from almost all seeds, we no longer see oscillations of finite period. In other words, slight variations in the initial population yield dramatically different results over time, a prime characteristic of chaos.
When r = 4, an exact solution is known, see here. In that case, the explicit formula is
The case r = 4 was used to build a random number generator decades ago. The k-th number Z(k) produced by the random number generator in question, is equal to
The numbers Z(k)’s are uniformly distributed on [0, 1]. I checked whether they were correlated or not, and could not find any statistically significant auto-correlations. Interestingly, I initially found all this information in a military document published in 1992, still hosted here on the .mil domain, and of course, not classified (not sure if it was ever classified.) The original work is by S.M. Ulam and J. von Neumann, and was published in 1947. Very few seeds will result in periodicity — an infinite number of them actually, but they are extremely rare, just like rational numbers among all real numbers.
Source: 2-D logistic map, see here
The logistic map has been generalized, for instance in two dimensions. see here for an application to population dynamics. The 2-dimensional case has also been used in image encryption, see here. Also, in 2012, Liu Yan and Tong Xiao-Yung published a paper entitled A new pseudorandom number generator based on complex number chaotic equation. However, the simplest non-periodic good random number generator might just be defined in the complex plane (on the unit circle) by
A recurrence formula can easily be obtained for the real and imaginary parts separately (both are real-valued random number generators) using complex exponentiation. Just like with the logistic map generator, a transformation must be applied to map the non-uniform deviate X(k), to a uniform deviate Z(k). The case b = 2 is similar to the logistic map generator with r = 4. Such home-made generators are free from NSA backdoor, in contrast for instance with the Dual Elliptic Curve Deterministic Random Bit Generator.
The Mandelbrot set is produced by a resursion similar to the logistic map, in the complex plane
Big Flaws in Popular Random Number Generators
Many applications require very good random number generators. For instance, to produce secure cryptographic systems, or when simulating or testing a large number of stochastic processes to check whether they fit with particular statistical distributions, as in my recent article on self-correcting random walks. In another article about the six degrees of separation problem, I needed a random number generator capable of producing millions of distinct integer values, and that is how I discovered the flaws with the Perl rand() function still in use today.
This generator can only produce 32,768 distinct values, and when you multiply any generated value by 32,768, you obtain an integer value. These values are supposed to simulate uniform deviates on [0, 1]. Applications that are designed using this generator might not be secure. Indeed, the Perl documentation states that
“Rand() is not cryptographically secure. You should not rely on it in security-sensitive situations. As of this writing, a number of third-party CPAN modules offer random number generators intended by their authors to be cryptographically secure, including: Data::Entropy, Crypt::Random, Math::Random::Secure, and Math::TrulyRandom.”
Chances are that similar issues can be found in many random number generators still in use today. I also tested the Excel rand() function, but could not replicate these issues. It looks like Microsoft fixed glitches found in its previous versions of Excel, as documented in this report. The Microsoft document in question provides details about the Excel random number generator: Essentially, it is equivalent to the sum of three linear congruential generators of periods respectively equal to a = 30,269, b = 30,307, and c = 30,323, so its overall period is equal to the product N = abc, at best. Note that a, b, and c are prime numbers. Also, in Excel, N * rand() should always be an integer, if you have this version of the algorithm built in your spreadsheet. This is hard to check, because we are beyond Excel’s precision, limited to about 13 digits. To put it differently, Excel can not produce a uniform deviate smaller than 1/N, other than zero, but again, this is difficult to check for the same reason.
To test these flaws and indeed reverse-engineer a random number generator producing deviates on [0, 1], one can proceed as follows.
- Generate a few deviates. Multiple the deviates by N, testing all N‘s between 1 and 1,000,000,000, to see if some N always result in an integer. Note that if you try with too large values of N, say N = 10^14, you will always get an integer, but in this case it is due to machine precision, not because of the random number generator.
- Generate millions of deviates. When you find one that is identical to the first deviate, check whether you’ve reached periodicity, with all subsequent values repeating the numbers found at the beginning. If that is the case, you’ve discovered the period of your random number generator. Note that the time series of deviates might not be periodic immediately, and may enter into a periodic cycle only after a certain time, or it could display quasi-periodicity.
- Generate millions of deviates. Check how many distinct values you get. For instance, if you generate 10,000,000 deviates and find only 32,768 distinct values, your random number generator has a severe issue.
- Generate millions of deviates. At each iteration, update the minimum value found so far. Is this minimum approaching zero over time, or is it stuck somewhere, very close yet not close enough to zero?
Interestingly, unlike the random number generator produced by the logistic map and discovered in 1947 (described above in this article) or generators based on hardware or on the time function of your computer, many random number generators used today (for instance the one used in Java) are based on linear congruence and are thus periodic, and poor. But a few are excellent, and have such a large period that they are un-breakable for all practical purposes, such as the Mersenne twister, with a period of 2^19937 −1 . Note that 2^19937 – 1 is a Mersenne prime.
An advantage of the logistic map generator, over hardware-based generators, is that you can reproduce the same sequence of random numbers over and over in any programming language and on any computer — a critical issue in scientific research, with many tests published in scientific journals not being re-producible. Other non-periodic, high quality random numbers offering the same reproducibility feature, include those based on irrational numbers. For instance, a basic recursion no more complicated than the logistic map formula, produces all the binary digits of SQRT(2)/2, one per iteration, and these digits are known to have a random behavior, according to standard tests. See also here, for an application about building a lottery that would not be illegal. For a fast random number generator based on the decimals of Pi, click here.
Note that the Perl random number generator has a period larger than 2^23 according to my computations, despite producing only 32,768 distinct values. It could even be non-periodic, afterall the binary digits of SQRT(2) produce only two distinct values – 0 and 1 – yet they are not periodic, otherwise it would represent a rational number.
Finally, the standard Diehard tests to assess the randomness of these generators should be updated. It was published in 1995 when big data was not widespread, and I am not sure whether these tests would be able to detect departure from randomness in generators used in sensitive applications requiring extremely large quantities of pseudo-random numbers.
Quantum algorithms
Quantum algorithms work well for applications that require performing a large number of repetitive, simple computations in which no shortcut seems available. The most famous one today is probably Shor’s algorithm, to factor a product of integers. Traditional encryption keys use a product of two very large prime numbers, large enough that no computer is able to factor them. Factoring a number is still considered an intractable problem, requiring to test a bunch of numbers as potential factors, without any obvious pattern that could accelerate the search. It is relevant to our context, as much of our concern here is about cryptographic applications and security. Also, the issue with these encryption keys is that they also should appear as random as possible – and products of large primes mimic randomness well enough. Maybe one day a classical (non-quantum) but efficient factoring algorithm will be found (see my articles on the subject) but for now, quantum computing with Shor’s algorithm seems promising, and can theoretically solve this problem very efficiently, thus jeopardizing the security of systems ranging from credit card payments to national security. This is why there is so much hype about this topic. Yet in 2014, the largest integer factored with quantum computers was only 56,153, a big increase over the record 143 achieved in 2012. But because of security concerns, new post-quantum cryptographic systems are being designed.
Another problem that benefits from quantum computers is counting the number of distinct values in a large set of integers. This is the element distinctness problem (click on the link to see the gain obtained with a quantum version.) In particular, in my 6 degrees of separation problem, I had to make sure that my random number generator was able to produce a very large number of distinct integers, each one representing a human being. Distinctness had to be tested. Again, this is an example of algorithm where you need to test many large blocks of numbers – independent from each other – without obvious shortcut to increase efficiency. Quantum algorithms can solve it far more efficiently than classical algorithms. As an alternative, the logistic map is known to produce infinite sequences of pseudo-random numbers that are all distinct – thus no test is needed. The related problem of detecting a period in long sequences that may or may not be periodic, and if periodic, have a very large period, can probably benefit as well from quantum architectures.
For further, reading, I suggest searching for Simulation of a Quantum Algorithm on a Classical Computer. Since quantum algorithms are more complex than classical ones, for quantum computing to become popular, one will have to develop a high-level programming or algorithmic language that can translate classic tasks into quantum code.
For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on LinkedIn.
DSC Resources
- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge
Popular Articles