This article was written by Ariful Mondal. Artful is a senior manager, data science and big data analytics consultant at Tata Consultancy Services.
1. Introduction
This is an attempt to showcase some worked out examples of Machine Learning (ML) use German Credit Data. Though we have selected credit scoring problem as a case study in this article, the same process will be applicable for wide range of classification or regression problems “Response modeling”, “Risk Management”, “Attrition/Churn management”, “Cross-Sell/Up-Sell”, “usage Patterns”, “Net Present Value”, “Life Time Value”, “Predictive Maintenance and condition based monitoring”, “Warranty”, “Reliability”, “Failure Prediction”, “Image/Video Processing”, “Crime”, “Medical Experiments”, “Hidden pattern recognition” . for Banking, Insurance, Finance, Telecom, Manufacturing, “Law Firms and Criminal Investigation”, “Surveillance”, “Catalogue”, “Travel Transport and Hospitality”, “Healthcare”, “Utilities”, “Publishing”, “Education” and any industry you may come across.
The basic difference of traditional modeling and machine learning is that “in traditional modeling we intend to set up a modeling framework and try to establish relationships while in machine learning we allow the model to learn from the data by understanding the hidden patterns”. Hence the first one requires analyst to have solid understanding of statistical techniques and business knowledge while the later one is more complex in nature and computational intensive, hence requires higher computation power of the systems and analyst needs to be tech savvy.
Kindly note that while traditional techniques perform well on small to large amount of data, machine learning will certainly learn better on high-dimensional and complex data such as Big Data set up.
Ariful has used following machine learning techniques in this article:
- Logistic Regression
- Recursive partitioning for classification (Basic and Bayesian)
- Random Forest
- Conditional Inference Tree
- Bayesian Networks
- Unbiased Non-parametric methods- Model Based (Logistic)
- Support Vector Machine
- Neural Network
- Lasso Regression
What you will find in this article:
- 1. Introduction
- 2.Data analysis and variable creation
- 3 Model Selection and Development
- 4 Model Performance Comparision
- 4.1 Receiver Operating Characteristic(ROC) curve
- 5 Concluding Remarks
- Appendix A: R Packages used in this paper
- References
- Appendix B About R Markdown
Check out all this information, here. For more articles about classifications in R, click here.
Top DSC Resources
- Article: What is Data Science? 24 Fundamental Articles Answering This Question
- Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Tutorial: Data Science Cheat Sheet
- Tutorial: How to Become a Data Scientist – On Your Own
- Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
- Tools: Hadoop – DataViZ – Python – R – SQL – Excel
- Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
- Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
- Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
- Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
- DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge