It is hard to imagine that some data element could contain less information than a bit (a digit equal to either 0 or 1.) Yet examples are abundant. Indeed, I am wondering if we should create a unit of information called microbit, or nanobit.
The first examples that come to my mind are some irrational numbers such as Pi: it’s digits are widely believed to be indistinguishable from pure noise, thus carrying essentially no information. While there is not enough data storage in the universe even if you could put a trillion digits in each atom, in terms of information all these digits contain a far smaller amount of information than a simple yes or no answer to any meaningful question. To put it differently, if you were able to compress a big message with any standard data compression algorithm, and the bits after compression would match the first trillion digits of number Pi in base 2, it means that your original message was pure gibberish, with no meaning, and no extractable information.
DNA: An example of nano-structure (though information-rich)
Communications based on Blockchain technology use a similar idea: when you mine bitcoins using hash keys, the resulting key, to be valid, must contain a number of pre-specified digits (all zeroes in this case.) This is accomplished by adding noise to the original block of text to be transmitted, until you find some noise that creates these zeroes in the right order after being hashed. It is just as hard as finding gibberish text that once compressed, matches the digits of Pi.
Another example is steganography: a technology used to hide messages in images or videos, for safe transmission. The image itself does not carry any valuable visible information, and if the actual message, encrypted and scattered randomly throughout the image, represents a small portion of the data, you can say that each pixel (assuming it is a black and white picture) carries much less than one bit of information.
Finally, I first came with this concept when researching numeration systems. I designed a system (like the decimal system) that provides highly correlated binary digits; even for the number 0, you would need to compute a lot of digits to get just a couple of correct decimals in base 10. It is a system with built-in redundancy. All number representation systems with a base that is smaller than 2 also have that feature, although less pronounced. In my new numeration system, even finding a set of digits that corresponds to an actual number is very hard, as the set of valid digit combinations is extremely sparse. In that sense, each digit, in that system, carries much less than one bit of information. See details here.
To summarize, just like units of distance cover a big spectrum, from light years to nano-millimeters, the same is true with units of information. Human beings are impressed by terabytes and petabytes, but at the other end of the spectrum, we have nanobits. Micro-information could become an interesting area of research for data scientists, with applications as described in this article. I did a Google search for the words microbits and nanobits, but found no interesting results. These two keywords are trademarked though, but used in a different context.
For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn.
DSC Resources
- Subscribe to our Newsletter
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Post a Blog | Forum Questions