Scikit-learn Classification Algorithms

This article was written by Matthew Mayo.

Scikit-learn is the de facto official machine learning library in use in the Python ecosystem. As described on its official website, Scikit-learn is:

Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable – BSD license

This tutorial is meant to serve as a demonstration of several machine learning classifiers, and { is inspired by | references | incoporates techniques from } the following excellent works:

Randal Olson’s An Example Machine Learning Notebook
Analytics Vidhya’s Common Machine Learning Algorithms Cheat Sheet
Scikit-learn’s official Cross-validation Documentation
Scikit-learn’s official Iris Dataset Documentation
Likely includes influence of the various referenced tutorials included in this KDnuggets Python Machine Learning article I recently wrote

We will use the well-known Iris and Digits datasets to build models with the following machine learning classification algorithms:

Logistic Regression
Decision Tree
Support Vector Machine
Naive Bayes
k-nearest Neighbors
Random Forests

We also use different strategies for evaluating models:

Separate testing and training datasets
k-fold Cross-validation

Some simple data investigation methods and tools will be undertaken as well, including:

Plotting data with Matplotlib
Building and data via Pandas dataframes
Constructing and operating on multi-dimensional arrays and matrices with Numpy

This tutorial is brief, non-verbose, and to the point. Please alert me if you find inaccuracies. Also, if you find it at all useful, and believe it to be worth doing so, please feel free to share it far and wide.

To read the tutorial, with the demonstration, click here.

Scikit-learn Classification Algorithms

Leave a Reply Cancel reply