This tutorial was written by Manish Saraswat.
Introduction
“The road to machine learning starts with Regression. Are you ready?”
If you are aspiring to become a data scientist, regression is the first algorithm you need to learnmaster. Not just to clear job interviews, but to solve real world problems. Till today, a lot of consultancy firms continue to use regression techniques at a larger scale to help their clients. No doubt, it’s one of the easiest algorithms to learn, but it requires persistent effort to get to the master level.
Running a regression model is a no-brainer. A simple model <- y~x
does the job. But optimizing this model for higher accuracy is a real challenge. Let’s say your model gives adjusted R² = 0.678; how will you improve it?
In this article, I’ll introduce you to crucial concepts of regression analysis with practice in R. Data is given for download below. Once you are finished reading this article, you’ll able to build, improve, and optimize regression models on your own. Regression has several types; however, in this article I’ll focus on linear and multiple regression.
Note: This article is best suited for people new to machine learning with requisite knowledge of statistics. You should have R installed in your laptops.
Table of Contents
- What is Regression? How does it work?
- What are the assumptions made in Regression?
- How do I know if these assumptions are violated in my data?
- How can I improve the accuracy of a Regression Model?
- How can I access the fit of a Regression Model?
- Practice Time – Solving a Regression Problem
To check out all this information, click here. For other article about regression analysis, click here.
Top DSC Resources
- Article: What is Data Science? 24 Fundamental Articles Answering This Question
- Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Tutorial: Data Science Cheat Sheet
- Tutorial: How to Become a Data Scientist – On Your Own
- Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
- Tools: Hadoop – DataViZ – Python – R – SQL – Excel
- Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
- Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
- Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
- Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
- DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge