The alphabet of data science…
- Artificial Intelligence:: AI is the capability of a machine to imitate intelligent human behavior. BMW, Tesla, Google are using AI for self-driving cars. AI should be used to solve real world tough problems like climate modeling to disease analysis and betterment of humanity.
- Boosting and Bagging: it is the technique used to generate more accurate models by ensembling multiple models together
- Crisp-DM: is the cross industry standard process for data mining. It was developed by a consortium of companies like SPSS, Teradata, Daimler and NCR Corporation in 1997 to bring the order in developing analytics models. Major 6 steps involved are business understanding, data understanding, data preparation, modeling, evaluation and deployment.
- Data preparation: In analytics deployments more than 60% time is spent on data preparation. As a normal rule is garbage in garbage out. Hence it is important to cleanse and normalize the data and make it available for consumption by model.
- Ensembling: is the technique of combining two or more algorithms to get more robust predictions. It is like combining all the marks we obtain in exams to arrive at final overall score. Random Forest is one such example combining multiple decision trees.
- Feature selection: Simply put this means selecting only those feature or variables from the data which really makes sense and remove non relevant variables. This uplifts the model accuracy.
- Gini Coefficient: it is used to measure the predictive power of the model typically used in credit