What is Data Mining?
Data mining is an integrated application in the Data Warehouse and describes a systematic process for pattern recognition in large data sets to identify conclusions and relationships. Using statistical methods, or genetic algorithms, data files can be automatically searched for statistical anomalies, patterns or rules.
Wikipedia defines Data Mining as “Data mining is an interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.”
Data mining is a new approach to data:
- Data Mining is not a simple use of statistical formulas.
- Data Mining is part of a key process to collect and use data.
- Data Mining is not just Excel spreadsheets with simple fields
- Data Mining is a recovery of data by computer and statistical techniques.
What are the practical applications with Data Mining?
- Automated prediction of trends and behaviors
- Automated discovery of unknown models
Data Mining is said to be the 8 Data Analysis Techniques Every Manager Should Understand:
- Correlation Analysis
- Regression Analysis
- Data Visualization
- Scenario Analysis
- Data mining
- Monte Carlo Simulation
- Neural Networks
- A/B Testing
What are the data mining parameters?
- Association – the search for patterns in which an event is connected to another event;
- Sequence or path analysis – looking for patterns where one event to another, later event leads;
- Classification – the search for new patterns (which leads eventually to the fact that the nature of changes, how the data is organized);
- Clustering – finding and visual documentation of previously unknown facts groups;
- Prediction – discovering patterns in data that can lead to meaningful predictions about the future (the area of data mining is also called predictive analytics refers).
What is Predictive Analytics?
According to Wikipedia, “Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.”
What’s behind Predictive Analytics?
Prerequisite for Predictive Analytics is the collection of large, partly unstructured data from different sources. The combination of different data sources such as weather, traffic and social media data, enriched by internal data is particularly important.
Predictive Analytics processes this data using different statistical methods such as extrapolation, regression, neural networks, or machine learning to detect in the data patterns and derive algorithms. These algorithms are reviewed based on test data and optimized. Also note that the more data are available, the more accurate are the developed algorithms. If the optimization process is finished, the algorithm and the model can be applied to data whose classification is unknown.
Data Mining vs. Predictive Analytics – Are They the Same?
”Often data mining and predictive analytics used interchangeably. In fact, methods and tools of data mining play an essential role in predictive analytics solutions; but predictive analytics goes beyond data mining. For example, predictive analytics also uses text mining, on algorithms-based analysis method for unstructured contents such as articles, blogs, tweets, Facebook contents.”