While nowadays everyone talk about machine learning, data science and AI, few are mentioning statistical science. Its association, The American Statistical Association, founded in 1839, is the second oldest continuously operating professional society in the US according to Wikipedia. The growth of this community was still strong a few decades ago.
A Bit of History
Of course machine learning relies heavily on statistics. The disconnect started possibly 20 years ago. I was myself a statistician back in 1990, working on exciting projects such as processing satellite images. Over time, it became clear that the work of people like myself was becoming less and less relevant to the statistical community. It became more and more computerized and automated, and less and less theoretical. Indeed, back then, the term used was computational statistics. But it was never absorbed into statistics. Instead, it became part of data mining, and then machine learning.
In the meanwhile, statistics became more and more associated with narrow fields. In particular, biostatistics and the pharmaceutical industry. The drug industry was the major source of revenue for the American Statistical Association. In turn, ASA started to heavily promote this field. Plenty of statistical methods were developed for tiny data sets (clinical trials). People working on big data became known as data scientists.
Besides epidemiology and clinical trials, statisticians also work on survey data and risk management. Mainly, census data and government agencies (Homeland Security). To this day, the Census Bureau gets over $2 billion in funding per year, to survey every resident in US every 10 years. Why they don’t use far less expensive sampling techniques, I don’t know. But it gives plenty of job opportunities to statisticians.
Statisticians still do a lot of experimental design and exploratory analysis. These tasks are getting more and more automated, by data scientists. Some work on mathematical statistics, mostly in Academia. There was a chance for the community to capitalize on supply chain optimization. But they missed the boat, and now this field is known as operations research and well separated from statistics. Yet, it is pure statistics.
Winds of Change
I became an anomaly in the statistical community. I am now a machine learning scientist, though my line of work is still the same. For a while, I was the voice of data science. In 2012, I founded Data Science Central, acquired by TechTarget in 2020. I still develop new statistical theories, such as dual confidence regions, minimum contrast estimation, or perturbed lattice point processes. Recently, I worked on the optimal shape of confidence regions, the size of connected components in nearest neighbor graphs, and visualizations (data animations). I describe this applied research in my new book, available here. For visualizations, see my recent post on Data Science Central, here.
Back in 1992, another statistician, Peter Rousseeuw, was also some kind of an outlier. He actually worked on outlier and anomaly detection, as well as clustering. He was a member of my Ph.D. thesis committee. Somehow, he became wealthy working for arbitraging firms on Wall Street. After my thesis, I worked on MCMC and Bayesian hierarchical models at the stats lab at Cambridge University. This is now considered part of machine learning and related to neural networks. My work on imagine filtering (still ongoing) is also closely related to neural networks. The most well know paper that I wrote on the subject has hundreds of citations. I did not publish it in a statistical journal. Instead, you can find it in IEEE Journal of Pattern Analysis and Machine Intelligence. While published in 1994, it became popular only very recently.
The Peter Rousseeuw Award
In 2021, Peter Rousseeuw decided to offer a biennial $1 million award for outstanding contributions with significant impact and wide application in statistical practice, with relevance to society. Peter still calls himself a statistician. There is no doubt that his intent is to revive statistical science, and I wish him success. As a Belgian citizen working on projects related to Peter’s research, I am not eligible for the award. It is interesting to note that the five areas considered for the prize are
- General statistical methodology,
- Computational statistics and data science,
- Biostatistics and environmetrics,
- Statistics in the physical science and industry,
- Statistics in economics and humanities.
The deadline for the 2022 award is March 31. It is still time to apply. If you are interested or know someone likely to be considered a viable candidate, here is the link: The Rousseeuw Prize.
The dollar amount is the highest that I am aware of, in all the big scientific prizes. It matches the amount offered by the Clay Institute to solve the most famous mathematical conjectures.
About the Author
Vincent Granville is a machine learning scientist, author and publisher. He was the co-founder of Data Science Central (acquired by TechTarget) and most recently, founder of Machine Learning Recipes.