The title of this blog is accurate
There are indeed some caveats / disadvantages to automated machine learning.
This blog was motivated by a question in my University of Oxford class effectively asking:
Will automated machine learning take over all the data scientists’ tasks?
If you listen to the prevailing view in the industry, automated machine learning will be like the 1950s labour saving devices – it would give us lots of leisure time – with no ‘cost’ on our side.
Here are my views
We need to break this question down a bit:
- Firstly, there are many initiatives (not just automated machine learning) that are making the data scientists job a lot easier. For example, the cloud providers are doing a great job in that regard. And this will continue.
- Secondly, Don’t be side tracked into thinking that automated machine learning is a premium feature of a tool which is hard to implement. There are many free tools that have automated machine learning inbuilt already ex Hyperopt-Sklearn, Auto-Sklearn, and TPOT AutoML libraries for Sciki…
So, what is automated machine learning?
When we refer to the term ‘automated machine learning’ we are speaking of automating specific tasks in the data science pipeline
These include:
- Data preparation and ingestion
- Feature engineering (Feature selection, Feature extraction)
- Detection and handling of skewed data and/or missing values
- Model selection
- Model validation
- Model tuning
- Hyperparameter Optimization
- Selection of evaluation metrics and validation procedures
At the higher end, there are other elements that can be automated. These include:
- Model interpretability
- Visualization
- Data ingestion and tagging
- Tagging
- Target platform customization (ex for GPUs, TPUs etc)
So, what are the disadvantages of this approach?
I am old enough to know Case tools.
In the case tools world, Managers fantasised of ‘building the system at the push of a button’ – sans developers!
Fast forward a few years, the managers are gone and the developers are there and even more valuable ..
But new tools have been developed to make things easy for the developers.
So, coming back to Automated machine learning, here are the caveats you should consider to take a balanced view
- Automated machine learning has a cost – most products which highlight automated machine learning as their core feature – are relatively expensive
- Automated machine learning has a switching cost. when implemented at a provider The more you ‘automate’ your pipeline for a specific provider the harder it is to switch
- Value / differentiation – The AI/ Data science role at senior levels is all about intellectual property / differentiation / scale. These elements need customization. If features which can be easily automated are the core value proposition of your service, it could lack differentiation
- The 80/20 rule – Automated machine leaning automate mostly the 80% which you could do as well in many cases. The 20% will require a lot of work in any case – probably irrespective of using automated machine learning or not
- The 80/20 rule applied to industries – the same idea could apply to industries. Most data science work today is based on financial services / insurance etc. If your industry is from outside this – you may have fewer prebuilt components in any case.
Image source: thebestofhealth.co.uk
1950s labour saving devices life of leisure somehow that did not quite happen