A Practical Implementation Guide to Predictive Data Analytics Using Python
- Covers basic to advanced topics in an easy step-oriented manner
- Concise on theory, strong focus on practical and hands-on approach
- Explores advanced topics, such as Hyper-parameter tuning, deep natural language processing, neural network and deep learning
- Describes state-of-art best practices for model tuning for better model accuracy
About The Book:
This book is your practical guide towards novice to master in machine learning with Python in six steps. The six steps path has been designed based on the “Six degrees of separation” theory which states that everyone and everything is a maximum of six steps away. Note that the theory deals with the quality of connections, rather than their existence. So, a great effort has been taken to design an eminent, yet simple six steps covering fundamentals to advanced topics gradually that will help a beginner walk his way from no or least knowledge of machine learning in Python to all the way to becoming a master practitioner. This book is also helpful for current Machine Learning practitioners to learn the advanced topics such as Hyperparameter tuning, various ensemble techniques, Natural Language Processing (NLP), deep learning, and basics of reinforcement learning.
Each topic has two parts, the first part will cover the theoretical concepts and the second part will cover practical implementation with different Python packages. The traditional approach of math to machine learning i.e., learning all the mathematic then understanding how to implement them to solve problems need a great deal of time/effort which has proven to be not efficient for working professionals looking to switch careers. Hence the focus in this book has been more on simplification, such that the theory/math behind algorithms have been covered only to extend required to get you started.
I recommend you to work with the book instead of reading it. Real learning goes on only through active participation. Hence, all the code presented in the book are available in the form of iPython notebooks to enable you to try out these examples yourselves and extend them to your advantage or interest as required later.
What You’ll Learn:
- Examine the fundamentals of Python programming language
- Review machine Learning history & evolution
- Learn various machine learning system development frameworks
- Learn fundamentals to advanced text mining techniques
- Learn and implement deep learning frameworks
Who This Book Is For:
This book will serve as a great resource for learning machine learning concepts and implementation techniques for:
- Python developers or data engineers looking to expand their knowledge or career into machine learning area.
- A current non-Python (R, SAS, SPSS, Matlab or any other language) machine learning practitioners looking to expand their implementation skills in Python.
- Novice machine learning practitioners looking to learn advanced topics such as hyperparameter tuning, various ensemble techniques, Natural Language Processing (NLP), deep learning, and basics of reinforcement learning.
Content at a Glance
- Introduction
- Chapter 1: Step 1 – Getting Started in Python
- Chapter 2: Step 2 – Introduction to Machine Learning
- Chapter 3: Step 3 – Fundamentals of Machine Learning
- Chapter 4: Step 4 – Model Diagnosis and Tuning
- Chapter 5: Step 5 – Text Mining and Recommender Systems
- Chapter 6: Step 6 – Deep and Reinforcement Learning
- Chapter 7: Conclusion
Table of Content
INTRODUCTION
CHAPTER 1: STEP 1 – GETTING STARTED IN PYTHON
- The Best Things in Life Are Free
- The Rising Star
- Python 2.7.x or Python 3.4.x?
- Windows Installation
- OSX Installation
- Linux Installation
- Python from Official Website
- Running Python
- Key Concepts
- Python Identifiers
- Keywords
- My First Python Program
- Code Blocks (Indentation & Suites)
- Basic Object Types
- When to Use List vs. Tuples vs. Set vs. Dictionary
- Comments in Python
- Multiline Statement
- Basic Operators
- Control Structure
- Lists
- Tuple
- Sets
- Dictionary
- User-Defined Functions
- Module
- File Input/Output
- Exception Handling
- Endnotes
CHAPTER 2: STEP 2 – INTRODUCTION TO MACHINE LEARNINGHISTORY AND EVOLUTION
- Artificial Intelligence Evolution
- Different Forms
- Statistics
- Data Mining
- Data Analytics
- Data Science
- Statistics vs. Data Mining vs. Data Analytics vs. Data Science
- Machine Learning Categories
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Frameworks for Building Machine Learning Systems
- Knowledge Discovery Databases (KDD)
- Cross-Industry Standard Process for Data Mining
- SEMMA (Sample, Explore, Modify, Model, Assess)
- KDD vs. CRISP-DM vs. SEMMA
- Machine Learning Python Packages
- Data Analysis Packages
- NumPy
- Pandas
- Matplotlib
- Machine Learning Core Libraries
- Data Analysis Packages
- Endnotes
CHAPTER 3: STEP 3 – FUNDAMENTALS OF MACHINE LEARNING
- Machine Learning Perspective of Data
- Scales of Measurement
- Nominal Scale of Measurement
- Ordinal Scale of Measurement
- Interval Scale of Measurement
- Ratio Scale of Measurement
- Feature Engineering
- Dealing with Missing Data
- Handling Categorical Data
- Normalizing Data
- Feature Construction or Generation
- Exploratory Data Analysis (EDA)
- Univariate Analysis
- Multivariate Analysis
- Supervised Learning– Regression
- Correlation and Causation
- Fitting a Slope
- How Good Is Your Model?
- Polynomial Regression
- Multivariate Regression
- Multicollinearity and Variation Inflation Factor (VIF)
- Interpreting the OLS Regression Results
- Regression Diagnosis
- Regularization
- Nonlinear Regression
- Supervised Learning – Classification
- Logistic Regression
- Evaluating a Classification Model Performance
- ROC Curve
- Fitting Line
- Stochastic Gradient Descent
- Regularization
- Multiclass Logistic Regression
- Generalized Linear Models
- Supervised Learning – Process Flow
- Decision Trees
- Support Vector Machine (SVM)
- k Nearest Neighbors (kNN)
- Time-Series Forecasting
- Unsupervised Learning Process Flow
- Clustering
- K-means
- Finding Value of k
- Hierarchical Clustering
- Principal Component Analysis (PCA)
- Endnotes
CHAPTER 4: STEP 4 – MODEL DIAGNOSIS AND TUNING
- Optimal Probability Cutoff Point
- Which Error Is Costly?
- Rare Event or Imbalanced Dataset
- Known Disadvantages
- Which Resampling Technique Is the Best?
- Bias and Variance
- Bias
- Variance
- K-Fold Cross-Validation
- Stratified K-Fold Cross-Validation
- Ensemble Methods
- Bagging
- Feature Importance
- RandomForest
- Extremely Randomized Trees (ExtraTree)
- How Does the Decision Boundary Look?
- Bagging – Essential Tuning Parameters
- Boosting
- Example Illustration for AdaBoost
- Gradient Boosting
- Boosting – Essential Tuning Parameters
- Xgboost (eXtreme Gradient Boosting)
- Ensemble Voting – Machine Learning’s Biggest Heroes United
- Hard Voting vs. Soft Voting
- Stacking
- Hyperparameter Tuning
- GridSearch
- RandomSearch
- Endnotes
CHAPTER 5: STEP 5 – TEXT MINING AND RECOMMENDER SYSTEMS
- Text Mining Process Overview
- Data Assemble (Text)
- Social Media
- Step 1 – Get Access Key (One-Time Activity)
- Step 2 – Fetching Tweets
- Data Preprocessing (Text)
- Convert to Lower Case and Tokenize
- Removing Noise
- Part of Speech (PoS) Tagging
- Stemming
- Lemmatization
- N-grams
- Bag of Words (BoW)
- Term Frequency-Inverse Document Frequency (TF-IDF)
- Data Exploration (Text)
- Frequency Chart
- Word Cloud
- Lexical Dispersion Plot
- Co-occurrence Matrix
- Model Building
- Text Similarity
- Text Clustering
- Latent Semantic Analysis (LSA)
- Topic Modeling
- Latent Dirichlet Allocation (LDA)
- Non-negative Matrix Factorization
- Text Classification
- Sentiment Analysis
- Deep Natural Language Processing (DNLP)
- Recommender Systems
- Content-Based Filtering
- Collaborative Filtering (CF)
- Endnotes
CHAPTER 6: STEP 6 – DEEP AND REINFORCEMENT LEARNING
- Artificial Neural Network (ANN)
- What Goes Behind, When Computers Look at an Image?
- Why Not a Simple Classification Model for Images?
- Perceptron – Single Artificial Neuron
- Multilayer Perceptrons (Feedforward Neural Network)
- Load MNIST Data
- Key Parameters for scikit-learn MLP
- Restricted Boltzman Machines (RBM)
- MLP Using Keras
- Autoencoders
- Dimension Reduction Using Autoencoder
- De-noise Image Using Autoencoder
- Convolution Neural Network (CNN)
- CNN on CIFAR10 Dataset
- CNN on MNIST Dataset
- Recurrent Neural Network (RNN)
- Long Short-Term Memory (LSTM)
- Transfer Learning
- Reinforcement Learning
- Endnotes
CHAPTER 7: CONCLUSION
- Summary
- Tips
- Start with Questions/Hypothesis Then Move to Data!
- Don’t Reinvent the Wheels from Scratch
- Start with Simple Models
- Focus on Feature Engineering
- Beware of Common ML Imposters
- Happy Machine Learning
Links:
- Apress Link: Click here!
- Amazon links by location: US, United Kingdom, India, Brazil, Canada, France, Germany, Italy, Japan, Mexico, Netherlands, Spain
DSC Resources
- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge
Popular Articles