In some applications, using the standard precision in your programming language of choice, may not be enough, and can lead to disastrous errors. In some cases, you work with a library that is supposed to provide very high precision, when in fact the library in question does not work as advertised. In some cases, lack of precision results in obvious problems that are easy to spot, and in some cases, everything seems to be working fine and you are not aware that your simulations are completely wrong after as little as 30 iterations. We explore this case in this article, using a simple example that can be used to test the precision of your tool and of your results.
Such problems arise frequently with algorithms that do not converge to a fixed solution, but instead generate numbers that oscillate continuously in some interval, converging in distribution rather than in value, unlike traditional algorithms that aim to optimize some function. The examples abound in chaotic theory, and the simplest case is the recursion X(k + 1) = 4 X(k) (1- X(k)), starting with a seed s = X(0) in [0, 1]. We will use this example – known as the logistic map – to benchmark various computing systems.
Examples of algorithms that can be severely impacted by aggregated loss of precision, besides ill-conditioned problems, include:
- Markov Chain Monte Carlo (MCMC) simulations, a modern statistical method of estimation for complex problems and nested or hierarchical models, including Bayesian networks.
- Reflective stochastic processes, see here. This includes some some types or Brownian or Wiener processes.
- Chaotic processes, see here (especially section 2.) These include fractals.
- Continuous random number generators, see here.
The conclusions based on the faulty sequences generated are not necessarily invalid, as long as the focus is on the distribution being studied, rather than on the exact values from specific sequences. For random number generators, it may result in a much smaller period than expected, still acceptable for most applications except strong cryptography. An example of problem where high precision values are needed, is when comparing two sequences generated by two different chaotic processes, to check whether you can map one onto the other, with a smooth transformation. Another example is computing long-range autocorrelations associated with these sequences.
Besides, it is interesting to know how many decimals are correct in your computations, especially when using high precision libraries that fail to deliver on the promises.
Benchmarking
We use the sequence X(k) defined in the introduction, with the seed s = 3 / 10, All the values generated lie in [0, 1]. We define precision as the metric used to determine how many decimals are correct. If n decimals are correct, we say that precision = n. We denote as X(k, n) the value of X(k) rounded to provide a precision equal to n. We simulate low precision using rounded high precision values, to show the impact or cumulative errors on the generated values, as k increases, as well as to detect when (for which value of n) the high precision libraries become untrustworthy. Note that in most systems, including Excel, the standard precision is approximately n = 14, which is not enough here.
The table below shows the first 80 iterations or so, computed with various precision levels. The full table can be downloaded here (Excel) and contains all the pictures included in this article, with detailed computation.
As you can see, the number of iterations providing a rather accurate value of X(k) increases linearly as a function of the precision n. This linear relationship is easy to understand. Every now and then, almost periodically, we hit a value x = X(k) such that the error computed on g(x) = 4 x (1 – x) with n decimals, reaches a peak, as shown in the picture below. Such values of x are easy to identify, and result in an increased deterioration of accuracy each time with get too close to them. And because the sequence X(k) oscillates without ever converging, the peak errors, however small, keep showing up unabated every now and then, rather than decreasing to 0 as in standard algorithms. Note that the distribution of X(k) can still be computed accurately despite the massive cumulative error, but not the the exact values after as little as 40 iterations.
Absolute (non cumulative) error for the first 170 iterations (n = 3)
The optimum precision to work with is function of how many correct successive values you need, but also based on the practical limitations of your high precision tool, and on how fast the algorithm needs to run. The higher the precision, the slower the computations. If you generate a large number of values, this becomes an issue.
Testing How Reliable your High Precision Tool is
This linear relationship between the number of iterations generating a rather accurate value (say correct to two decimals) and the precision n, can be used to test how good your high precision library is, and when it fails. In this sense, the sequence studied here is good for benchmarking purposes.
Interpretation: With n =20, only the first 67 iterations are correct up to two decimals; — Somewhere between n = 40 and n = 45, the gain in precision becomes an illusion
Note that for Excel (n = 14) and for most programming languages using standard arithmetic, the Y-value in the above picture would have been close to 47.
In the above picture, somewhere between n = 40 and n = 45, the gain in precision becomes an illusion. It is impossible that suddenly, we get so many iterates of X(k) correct up to 2 decimals. I actually computed 100,000 iterations with n as high as 100, without any subsequent improvement. Clearly, my BigNumbers library (in Perl) is not working as advertised beyond n = 45, or maybe some functions used in my code, possibly INT, have a lower limit on precision. I have read similar stories for the Decimals library in Python. Getting results slightly better than mine is probably easy, but computing a correct value with two decimals for k = 1,000,000 is likely a very challenging problem, even though a simple direct formula exists for X(k). Such values could be used to produce secure cryptographic systems, though they might be breakable using quantum algorithms or high performance computing (HPC.)
Application to Cryptography
The RSA encryption algorithm relies on factoring a product of two very large primes. If you only know the product, it is computationally impossible to factor it to retrieve the two primes to decrypt the message. If you know one of the two primes, factoring is easy. These primes are associated with the public and private keys.
Instead of using the product of two primes, one could use the sum of two numbers X(k) and X(k+m) computed with 100 decimals, for large k and m. If you know the sum, it is computationally impossible to retrieve the two terms. If you know X(k), you can easily retrieve X(k+m). Better, use transformed X(k)’s that have a uniform distribution on [0, 1]. A simple transformation exists to “uniformize” the sequence, making your system even stronger. You can also use two different seeds for X(k) and X(k+m). Note that X(k) and X(k+m) are not correlated, even if using the same seed. High precision is critical here.
Source Code
Source code in Perl, Python and Java, is available here.
For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on LinkedIn.
DSC Resources
- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge
Popular Articles