What are the advantages of different classification algorithms?
For instance, if we have large training data set with approx more than 10,000 instances and more than 100,000 features, then which classifier will be best to choose for classification?
Xavier Amatriain, PhD in CS, former Professor and coder has answered the question:
There are a number of dimensions you can look at to give you a sense of what will be a reasonable algorithm to start with, namely:
- Number of training examples
- Dimensionality of the feature space
- Do I expect the problem to be linearly separable?
- Are features independent?
- Are features expected to linearly dependent with the target variable?
- Is overfitting expected to be a problem?
- What are the system’s requirement in terms of speed/performance/memory usage..?
This list may seem a bit daunting because there are many issues that are not straightforward to answer. The good news though is, that as many problems in life, you can address this question by following the Occam’s Razor principle: use the least complicated algorithm that can address your needs and only go for something more complicated if strictly necessary.
– Logistic Regression
– Support Vector Machines
– Tree Ensembles
– Deep Learning
To read the full article (posted as a Quora question, including 22 answers), click here. For more articles on classification, click here.
DSC Resources
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
Additional Reading
- What statisticians think about data scientists
- Data Science Compared to 16 Analytic Disciplines
- 10 types of data scientists
- 91 job interview questions for data scientists
- 50 Questions to Test True Data Science Knowledge
- 24 Uses of Statistical Modeling
- 21 data science systems used by Amazon to operate its business
- Top 20 Big Data Experts to Follow (Includes Scoring Algorithm)
- 5 Data Science Leaders Share their Predictions for 2016 and Beyond
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 22 tips for better data science
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- High versus low-level data science
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge