Did you ever wonder why linear regression plays an important role in statistics and machine learning? It is witnessed that linear regression is one of the most commonly and well–understood algorithms.
Regression is a statistical method for calculating relationships among variables. It is one of the most popular and simplest regression techniques and is a very good way to understand your data. Note that regression techniques are not 100% accurate even if you use higher-order (nonlinear) polynomials. The key with regression, as with most machine learning techniques, is to find a good-enough technique and not the perfect technique and model.
This article is an excerpt from the book Mastering Go – Second Edition by Mihalis Tsoukalos. Mihalis runs through the nuances of Go with deep guides to types and structures, packages, concurrency, network programming, and compiler design. In this article, we will see building machine learning systems in Go, from simple statistical regression to complex neural networks.
Model your data
The idea behind linear regression is simple: you are trying to model your data using a first-degree equation. A first-degree equation can be represented as y = a x + b.
There exist many methods that allow you to find out that first-degree equation that will model your data – all techniques calculate a and b.
Linear regression
The Go code of this section will be saved in regression.go, which is going to be presented in three parts. The output of the program will be two floating-point numbers that define a and b in the first-degree equation.
The first part of regression.go contains the following code:
import ( "encoding/csv" "flag" "fmt" "gonum.org/v1/gonum/stat" "os" "strconv" ) type xy struct { x []float64 y []float64 }
The xy structure is used to hold the data and should change according to your data format and values.
The second part of regression.go is as follows:
func main() {
flag.Parse()
if len(flag.Args()) == 0 {
fmt.Printf(“usage: regression filenamen”)
return
}
filename := flag.Args()[0]
file, err := os.Open(filename)
if err != nil {
fmt.Println(err)
return
}
defer file.Close()
r := csv.NewReader(file)
records, err := r.ReadAll()
if err != nil {
fmt.Println(err)
return
}
size := len(records)
data := xy{
x: make([]float64, size),
y: make([]float64, size),
}
The last part of regression.go is as follows:
for i, v := range records {
if len(v) != 2 {
fmt.Println(“Expected two elements”)
continue
}
if s, err := strconv.ParseFloat(v[0], 64); err == nil {
data.y[i] = s
}
if s, err := strconv.ParseFloat(v[1], 64); err == nil {
data.x[i] = s
}
}
b, a := stat.LinearRegression(data.x, data.y, nil, false)
fmt.Printf(“%.4v x + %.4vn”, a, b)
fmt.Printf(“a = %.4v b = %.4vn”, a, b)
}
The data from the data file is read into the data variable. The function that implements the linear regression is stat.LinearRegression() and it returns two numbers, which are b and a, in that particular order.
At this point, it would be a good time to download the gonum package:
$ go get -u gonum.org/v1/gonum/stat
Executing regression.go with the input data stored in reg_data.txt will generate the following output:
$ go run regression.go reg_data.txt
0.9463 x + -0.3985
a = 0.9463 b = -0.3985
The two numbers returned are a and b from the y = a x + b formula.
The contents of reg_data.txt are as follows:
$ cat reg_data.txt
1,2
3,4.0
2.1,3
4,4.2
5,5.1
-5,-5.1
Plotting data
It is now time to plot the results and the dataset in order to test how accurate the results from the linear regression technique are. For that purpose, we are going to use the Go code of plotLR.go, which will be presented in four parts. plotLR.go requires three command-line arguments, which are a and b from the y = a x + b formula, and the file that contains the data points. The fact that plotLR.go does not calculate a and b on its own gives you the opportunity to experiment with a and b using your own values or values that were calculated by another utility.
The first part of plotLR.go is as follows:
package main
import (
“encoding/csv”
“flag”
“fmt”
“gonum.org/v1/plot”
“gonum.org/v1/plot/plotter”
“gonum.org/v1/plot/vg”
“image/color”
“os”
“strconv”
)
type xy struct {
x []float64
y []float64
}
func (d xy) Len() int {
return len(d.x)
}
func (d xy) XY(i int) (x, y float64) {
x = d.x[i]
y = d.y[i]
return
}
The Len() and XY() methods are needed for the plotting part, whereas the image/color package is needed for changing the colors in the output.
The second part of plotLR.go contains the following code:
func main() {
flag.Parse()
if len(flag.Args()) < 3 {
fmt.Printf(“usage: plotLR filename a bn”)
return
}
filename := flag.Args()[0]
file, err := os.Open(filename)
if err != nil {
fmt.Println(err)
return
}
defer file.Close()
r := csv.NewReader(file)
a, err := strconv.ParseFloat(flag.Args()[1], 64)
if err != nil {
fmt.Println(a, “not a valid float!”)
return
}
b, err := strconv.ParseFloat(flag.Args()[2], 64)
if err != nil {
fmt.Println(b, “not a valid float!”)
return
}
records, err := r.ReadAll()
if err != nil {
fmt.Println(err)
return
}
This part of the program works with the command-line arguments and the reading of the data.
The third part of plotLR.go is as follows:
size := len(records)
data := xy{
x: make([]float64, size),
y: make([]float64, size),
}
for i, v := range records {
if len(v) != 2 {
fmt.Println(“Expected two elements per line!”)
return
}
s, err := strconv.ParseFloat(v[0], 64)
if err == nil {
data.y[i] = s
}
s, err = strconv.ParseFloat(v[1], 64)
if err == nil {
data.x[i] = s
}
}
The last part of plotLR.go is as follows:
line := plotter.NewFunction(func(x float64) float64 { return a*x + b })
line.Color = color.RGBA{B: 255, A: 255}
p, err := plot.New()
if err != nil {
fmt.Println(err)
return
}
plotter.DefaultLineStyle.Width = vg.Points(1)
plotter.DefaultGlyphStyle.Radius = vg.Points(2)
scatter, err := plotter.NewScatter(data)
if err != nil {
fmt.Println(err)
return
}
scatter.GlyphStyle.Color = color.RGBA{R: 255, B: 128, A: 255}
p.Add(scatter, line)
w, err := p.WriterTo(300, 300, “svg”)
if err != nil {
fmt.Println(err)
return
}
_, err = w.WriteTo(os.Stdout)
if err != nil {
fmt.Println(err)
return
}
}
The function that is going to be plotted is defined using the plotter.NewFunction() method.
At this point, you should download some external packages by executing the following commands:
$ go get -u gonum.org/v1/plot
$ go get -u gonum.org/v1/plot/plotter
$ go get -u gonum.org/v1/plot/vg
Executing plotLR.go will generate the following kind of output:
$ go run plotLR.go reg_data.txt
usage: plotLR filename a b
$ go run plotLR.go reg_data.txt 0.9463 -0.3985
<?xml version=”1.0″?>
<!– Generated by SVGo and Plotinum VG –>
/span>svg width=”300pt” height=”300pt” viewBox=”0 0 300 300″
xmlns=”http://www.w3.org/2000/svg”
xmlns:xlink=”http://www.w3.org/1999/xlink”>
<g transform=”scale(1, -1) translate(0, -300)”>
.
.
.
Therefore, you should save the generated output in a file before using it:
$ go run plotLR.go reg_data.txt 0.9463 -0.3985 > output.svg
As the output is in Scalable Vector Graphics (SVG) format, you should load it into a web browser in order to see the results. The results from our data can be seen in the following figure.
Figure 1: The output of the plotLR.go program
The output of the image will also show how accurately the data can be modeled using a linear equation.
Dive deep into the machine learning in Go, guiding you from the foundation statistics techniques through simple regression and clustering to classification, neural networks, and anomaly detection. Become an expert Go developer from our latest book Mastering Go – Second Edition written by Mihalis Tsoukalos.
About the Author
Mihalis Tsoukalos is an accomplished author. His previous books, Go Systems Programming and Mastering Go have become a must-read for the Unix and Linux systems professionals. When not writing books, he spends his working life as a Unix administrator, programmer, DBA, and mathematician who enjoys writing technical articles and learning new technologies. His research interests include programming languages, visualization, and databases.