As we all know CRISP DM stands for Cross Industry Standard Process for Data Mining is a process model that outlines the most common approach to tackle data driven problems. Per the poll conducted by KDNuggets in 2014 this was and “is” one of the most popular and widest used methodology. This method of gleaning insights out of the data is very dear to the industry experts and data miners.
As the title suggest I will align some of the most useful R packages with this most popular and simplistic data processing model and before getting into specific packages, there is one GUI based R Package named Rattle which is very much based around these steps.
Rattle’s tab based interface provides a step by step flow which is a replica of the CRISP DM method. Rattle by itself may suffice the user needs for introducing data mining. However, it also provides stepping stone to a more robust and sophisticated data processing and modelling.
Let’s look at the typical data mining process per the CRISP DM relative to the Rattle Package:
- Problem Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
Below is the Rattle Tab that is setup based on the CRISP DM Method of data mining.
Now let’s look at some standalone R packages based on the CRISP DM data processing methodology.
Happy Data Munging to All !!!