What a weird question. That’s what you would have thought after reading the headline. Perhaps you thought the word “NOT” was accidental.
Hmm, for past few years many of us have come across articles like
- “Top 10 Machine Learning Algorithms every Data Scientist SHOULD KNOW”
- “Top 20 R packages every Data Scientist SHOULD KNOW”
- “TOP 30 Python Libraries every Data Scientist SHOULD KNOW”
The list is endless. Any new Data Science aspirant already waves the white flag just by merely seeing the “SHOULD KNOW” type articles on the internet.
At the end of the day, a person does not know where to start in the first place due to the overwhelming amount of information.
What I have described above is a Problem from an aspiring Data Scientist point of view.
Build Me An ML Model
There is a bigger problem due to the “SHOULD KNOW” type of articles and the problem bearer are companies- both startups and big MNCs.
What are the problems you ask ?
Everybody wants to have a pie of the latest in thing “ Data Science” .
Many companies want to do Data Science and as things are new for many of these companies, the job description is often strange and the interview process even stranger.
Some of these companies influenced by “SHOULD KNOW” kind of articles tell the job applicant
Here is our problem, What Machine Learning Algorithms can be applied ?
The newly minted Data Scientists quickly blurt out 2–3 ML algorithms and the enamored company hires him/her . In due course of time the algorithms are implemented. The Data Scientist impresses the company with good accuracy % of the models. The models are put in production. But lo and behold, the model does not net the company the ROI it hoped for. What happened?
Well what happened was the Data Scientist did not have business acumen and thought his/her KPI was just building ‘good’ ML models. The company had business acumen but not the Machine Learning / Statistics Knowledge. The ideal marriage never happened.
The Ship Repair Man Story
We all have heard of this story or the variant of the story.
A ship company hired an Engineer to fix the engine of the ship. The Engineer had all the tools in his toolkit. After some analysis the engineer took out a hammer and hit one of the components of the engine. The Engine started to work.Next day the Engineer sent the invoice to the Ship company for a whopping $10,000 for hardly a 5 min job.
The Ship company manager was taken aback and asked the Engineer to itemize the invoice. The bill read as follows
Hitting with Hammer — $ 2
Knowing where to Hit — $ 9,998
Now you may think I am laying emphasis on Domain Knowledge and Experience, yes you guessed it right.
The Ship Repair Man – Data Scientist Analogy
The Engineer in the story had all the tools in his toolkit, yet chose only the hammer (perhaps the simplest tool) to fix the engine. Also, most importantly he knew where the problem was . Similarly, Shouldn’t a Data Scientist choose to solve problems first by basic Analytics ? rather implementing Machine Learning algorithms straightaway ?
Minimizing Loss Function
“All Models are wrong, some are useful”.
In most Machine Learning Algorithms we try to minimize the loss function.
Models are an abstraction of the reality. The word here is abstraction. It is not actual.
If you think about it, the process of building Machine Learning Algorithms itself has a larger ‘Loss Function”. That is we differ from the reality.
So, shouldn’t we build less models to minimize this larger ‘Loss Function’ ?
Hey Data Scientist, Think like a CEO
Often we Data Scientists get pigeon holed into a very technical thinking. We think only in terms of which ML algo can be applied to x, y, z problem. How to do feature selection. How to reduce the number of features. How to improve the accuracy of the models.
What we don’t think is how the ML algorithms will benefit the company. How much money am I gonna save or earn the company through my ML algorithm. Will the ROI be positive ?
The most important question we forget to ask is “Is Machine Learning algorithm really required for this business problem” ?
I know the last statement would have set a cat among the pigeons. Many of you would be alarmed and probably might ask “Are you trying to put us out of our job ?”.
On the contrary, No.
There are many business problems which do require Machine Learning approaches but not all. Most of the business problems can be solved through simple analytics or a base line approach.
What will put us out of our job is Machine Learning Overkill. I have seen implementation of Machine Learning algorithms to very frivolous problems and worse still the companies have invested heavily into the idea. It is a ticking time bomb. The moment the companies realize that the ROI is negative, they will shun the Data Science practice altogether. We all know how difficult it is to win over a chided customer. No Data Science, No Data Scientist.
Cometh The Hour, Cometh The Data Science Auditor
The Industry is both excited and wary about the prospects of Data Science. Many who have implemented the Data Science solution are left disenchanted due to the poor ROI.
Enter the Data Science Auditor
I foresee a new job role being created “THE DATA SCIENCE AUDITOR”, where companies would hire experienced Data Scientist (statisticians / applied mathematicians) to audit the Data Science Projects.
In one of my recent consulting project I felt exactly like an auditor. I was asked to improvise the ML model built by a Data Scientist, but upon analysis found that the ML algorithm applied was not only wrong but for the given business problem no ML algorithm would work !!
The Client was simply taken in for a ride.
The Repercussion — The Client did not have a good opinion about Data Scientists and felt cheated both emotionally and monetarily.
Perhaps, next time ask not a Data Scientist “How many ML Algorithms you have built”
Rather ask
“How Many ML Algorithms You Have NOT Built”