Free Book: Lecture Notes on Machine Learning

Lecture notes for the Statistical Machine Learning course taught at the Department of Information Technology, University of Uppsala (Sweden.) Updated in March 2019. Authors: Andreas Lindholm, Niklas Wahlström, Fredrik Lindsten, and Thomas B. Schön.

Source: page 61 in these lecture notes

Available as a PDF, here (original) or here (mirror).

Content

1 Introduction 7

1.1 What is machine learning all about?
1.2 Regression and classification
1.3 Overview of these lecture notes
1.4 Further reading

2 The regression problem and linear regression 11

2.1 The regression problem
2.2 The linear regression model

Describe relationships — classical statistics
Predicting future outputs — machine learning

2.3 Learning the model from training data

Maximum likelihood
Least squares and the normal equations

2.4 Nonlinear transformations of the inputs – creating more features
2.5 Qualitative input variables
2.6 Regularization

Ridge regression
LASSO
General cost function regularization

2.7 Further reading
2.A Derivation of the normal equations

A calculus approach
A linear algebra approach

3 The classification problem and three parametric classifiers 25

3.1 The classification problem
3.2 Logistic regression

Learning the logistic regression model from training data
Decision boundaries for logistic regression
Logistic regression for more than two classes

3.3 Linear and quadratic discriminant analysis (LDA & QDA)

Using Gaussian approximations in Bayes’ theorem
Using LDA and QDA in practice

3.4 Bayes’ classifier — a theoretical justification for turning p(y | x) into yb

Bayes’ classifier
Optimality of Bayes’ classifier
Bayes’ classifier in practice: useless, but a source of inspiration
Is it always good to predict according to Bayes’ classifier?

3.5 More on classification and classifiers

Regularization
Evaluating binary classifiers

4 Non-parametric methods for regression and classification: k-NN and trees 43

4.1 k-NN

Decision boundaries for k-NN
Choosing k
Normalization

4.2 Trees

Basics
Training a classification tree
Other splitting criteria
Regression trees

5 How well does a method perform? 53

5.1 Expected new data error Enew: performance in production
5.2 Estimating Enew

Etrain 6≈ Enew: We cannot estimate Enew from training data
Etest ≈ Enew: We can estimate Enew from test data
Cross-validation: Eval ≈ Enew without setting aside test data

5.3 Understanding Enew

Enew = Etrain+ generalization error
Enew = bias2 + variance + irreducible error

6 Ensemble methods 67

6.1 Bagging

Variance reduction by averaging
The bootstrap

6.2 Random forests
6.3 Boosting

The conceptual idea
Binary classification, margins, and exponential loss
AdaBoost
Boosting vs. bagging: base models and ensemble size
Robust loss functions and gradient boosting

6.A Classification loss functions

7 Neural networks and deep learning 83

7.1 Neural networks for regression

Generalized linear regression
Two-layer neural network
Matrix notation
Deep neural network
Learning the network from data

7.2 Neural networks for classification

Learning classification networks from data

7.3 Convolutional neural networks

Data representation of an image
The convolutional layer
Condensing information with strides
Multiple channels
Full CNN architecture

7.4 Training a neural network

Initialization
Stochastic gradient descent
Learning rate
Dropout

7.5 Perspective and further reading

A Probability theory 101

A.1 Random variables

Marginalization
Conditioning

A.2 Approximating an integral with a sum

B Unconstrained numerical optimization 105

B.1 A general iterative solution
B.2 Commonly used search directions

Steepest descent direction
Newton direction
Quasi-Newton

B.3 Further reading

Bibliography

Free Book: Lecture Notes on Machine Learning

Leave a Reply Cancel reply