Back to the basics. Here we are dealing with the oldest data set, created billions of years ago — the set of integers — and mostly the set consisting of two numbers: 0 and 1. All of us have learned how to write numbers even before attending primary school. Yet, it is attached to the most challenging unsolved mathematical problems of all times, such as the distribution of the digits of Pi in the decimal system. The table below reflects this contrast, being a blend of rudimentary and deep results. It is a reference for statisticians, number theorists, data scientists, and computer scientists, with a focus on probabilistic results. You will not find it in any textbook. Some of the systems described here (logistic map of exponent p = 1/2, nested square roots, auto-correlations in continued fractions) are research results published here for the first time.
This material is described in this article, including how to derive all the results, and the equivalence between the base-2 system and the standard logistic map (with p = 1) mentioned in exercise 7. It includes applications to cryptography, random number generation, high performance computing (HPC), population growth modeling, financial markets, BlockChain, and more.
A higher resolution of the table below is available here.
Many other systems are not described in this table, including:
- The iterated exponential map, possibly one of the most mysterious chaotic systems. See exercise 4 here for details.
- The nested cubic root, a generalization of the nested square root.
- The generalized continued fraction of power p, especially p = 2, defined by
Other interesting facts:
For the standard logistic map (p = 1) a closed-form formula is available for x(n):
Also, if x(n) is the base-2 sequence for x, then sin^2(Pi x(n)) is the logistic map sequence for sin^2(Pi x).
In base b, if b is an integer, the exact formulas are as follows:
Note that x(1) = x, since x is between 0 and 1.
Thus, proving that x is a good seed in base b (b integer) amounts to proving that the sequence { b^n x } is equidistributed modulo 1 (see also here for a curious fact about x = Pi.)The case x = SQRT(2) in base-2 is studied here. It was proved that for this particular case, the frequencies of 0’s and 1’s in the base-2 representation, are both equal to 50%. See here. I am working on a more simple proof, using the simple recursion described here. A special formula for the digits of Pi in base-16 is also available, see here. See also this link.
Not all number representation systems are as well-behaved as those describe here. For examples of more complex systems, see here. Also, even in well-behaved systems, some numbers (a subset of bad seeds) do not have any kind of statistical distribution for their digits. An example in base-2 is as follows: the first digit is 1, the next 2 digits are 0, the next 4 digits are 1, the next 8 digits are 0, the next 16 digits are 1, the next 32 digits are 0, and so on.
Reverse-engineering number representation systems
Some systems, like the Logistic Map 1, or base-b where b is an integer, have an exact formula to compute x(n) for any n without having to compute x(n-1), x(n-2) and so on. Is it possible, in these systems, given a particular n and x(n), to retrieve the seed x (and thus all its digits) by solving the equation in question?
We tried in base-2 for x = SQRT(2) / 2 and n = 10, and the answer is positive (at least for a good seed x, which is the case here.) In this case, for a given n, one has to solve the equation
where x is the unknown, b = 2, and both n and x(n) are known. A dichotomic search algorithm provides a solution very efficiently, since we know that x is in [0, 1]. The solution is not unique though. This could be an issue for NSA and other organizations developing strong encryption systems. Or, to the contrary, it could help them in their reverse-engineering activities. Or better, it could help them explore systems with a natural, built-in backdoor, without being accused of manufacturing man-made, mandatory, artificial backdoor, in existing encryption schemes.
Imagine an encryption system where x is the message, and x(n) is the encrypted message, using (say) the Logistic map 1 with n = 500. No one is ever going to figure out that you can actually retrieve x from x(n). And computing x(n) itself is not easy, requiring industrial-strength high precision computing. Retrieving x from x(n) is even more difficult, despite the simple equation providing the solution: It is all about industrial strength HPC. But the fear of reversibility (successful attacks on cryptographic keys) has led to the development of quantum algorithms and quantum computers.
Related articles
For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on LinkedIn.
DSC Resources
- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge
Popular Articles