How to Choose a Machine Learning Model – Some Guidelines

ajitjaokar
October 15, 2018 at 7:30 am

In this post, we explore some broad guidelines for selecting machine learning models

The overall steps for Machine Learning/Deep Learning are:

Collect data
Check for anomalies, missing data and clean the data
Perform statistical analysis and initial visualization
Build models
Check the accuracy
Present the results

Machine learning tasks can be classified into

Supervised learning
Unsupervised learning
Semi-supervised learning
Reinforcement learning

PS – in this document – we do not focus on the last two

Below are some approaches on choosing a model for Machine Learning/Deep Learning

OVERALL APPROACHES

Dealing with unbalanced data: Use resampling strategies

Create new features : Principal component analysis (PCA) to reduce dimensionality, Autoencoders to create a latent space and possibly Clustering to create new features

To prevent overfitting, outliers and noise in linear regression – use regularization techniques like lasso and ridge.

Overcoming the Black-box AI problem – consider strategies for building interpretable models

Algorithms not sensitive to outliers : Some discussion on choice of Random Forest to overcome outliers

MACHINE LEARNING MODELS

First approach to predicting continuous values: Linear Regression is generally a good first approach for predicting continuous values (ex: prices)

Binary classification: Logistic regression is a good starting point for Binary classification. Support Vector Machines SVM is also a good choice of two class classification

Multi-class classification: Random forest is a choice for multi-class classification. See SVM vs Random Forest usage

Is there a simplest or easiest model category to start off with? Decision trees are often seen as simple to understand and use. Decision trees are implemented through models such as Random forest or Gradient boosting.

Which models are used in Kaggle? For supervised learning: Random forest and XGboost See note on Gradient boosted trees

DEEP LEARNING MODELS

Complex features which cannot be easily specified but you have large number of labelled examples: Multi-layer perceptrons

Vision based Machine Learning: Image classification, Object Detection, Image segmentation – Convolutional Neural Networks

Sequence modelling tasks: RNNs (typically LSTM) for sequence modelling tasks ex text classification or language translation

Comments welcome

Image source: BMJ – what makes machine learning in healthcare so powerful

Tags:Uncategorized

Leave a Reply Cancel reply