These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools. When you click on any of the 40 links below, you will find a selection of articles related to the entry in question. Most of these articles are hard to find with a Google search, so in some ways this gives you access to the hidden literature on data science, machine learning, and statistical science. Many of these articles are fundamental to understanding the technique in question, and come with further references and source code.
Starred techniques (marked with a *) belong to what I call deep data science, a branch of data science that has little if any overlap with closely related fields such as machine learning, computer science, operations research, mathematics, or statistics. Even classical machine learning and statistical techniques such as clustering, density estimation, or tests of hypotheses, have model-free, data-driven, robust versions designed for automated processing (as in machine-to-machine communications), and thus also belong to deep data science. However, these techniques are not starred here, as the standard versions of these techniques are more well known (and unfortunately more used) than the deep data science equivalent.
To learn more about deep data science, click here. Note that unlike deep learning, deep data science is not the intersection of data science and artificial intelligence; however, the analogy between deep data science and deep learning is not completely meaningless, in the sense that both deal with automation.
Also, to discover in which contexts and applications the 40 techniques below are used, I invite you to read the following articles:
Finally, when using a technique, you need to test its performance. Read this article about 11 Important Model Evaluation Techniques Everyone Should Know.
The 40 data science techniques
- Linear Regression
- Logistic Regression
- Jackknife Regression *
- Density Estimation
- Confidence Interval
- Test of Hypotheses
- Pattern Recognition
- Clustering – (aka Unsupervised Learning)
- Supervised Learning
- Time Series
- Decision Trees
- Random Numbers
- Monte-Carlo Simulation
- Bayesian Statistics
- Naive Bayes
- Principal Component Analysis – (PCA)
- Ensembles
- Neural Networks
- Support Vector Machine – (SVM)
- Nearest Neighbors – (k-NN)
- Feature Selection – (aka Variable Reduction)
- Indexation / Cataloguing *
- (Geo-) Spatial Modeling
- Recommendation Engine *
- Search Engine *
- Attribution Modeling *
- Collaborative Filtering *
- Rule System
- Linkage Analysis
- Association Rules
- Scoring Engine
- Segmentation
- Predictive Modeling
- Graphs
- Deep Learning
- Game Theory
- Imputation
- Survival Analysis
- Arbitrage
- Lift Modeling
- Yield Optimization
- Cross-Validation
- Model Fitting
- Relevancy Algorithm *
- Experimental Design
The number of techniques is higher than 40 because we updated the article, and added additional ones.
DSC Resources
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
Additional Reading
- What statisticians think about data scientists
- Data Science Compared to 16 Analytic Disciplines
- 10 types of data scientists
- 91 job interview questions for data scientists
- 50 Questions to Test True Data Science Knowledge
- 24 Uses of Statistical Modeling
- 21 data science systems used by Amazon to operate its business
- Top 20 Big Data Experts to Follow (Includes Scoring Algorithm)
- 5 Data Science Leaders Share their Predictions for 2016 and Beyond
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 22 tips for better data science
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- High versus low-level data science
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge