One of the hot topics on Machine Learning is, with no doubts, feature engineering. In fact, it comes before the buzz on this topic, simple when we talk about Data Mining. Remembering the CRISP-DM process, feature engineering (and, consequently, feature selection) is the core of a great data mining project – it comes to life on the Data Preparation phase, that is the task to have constructive data preparation operations such as the production of derived attributes or entire new records, or transformed values for existing attributes.
A very good definition, elegant in its simplicity, is that feature engineering is the process to create features that make machine learning algorithms work. And what makes it so important? Simple: feature engineering is what will determine if your project is going to success, not only how good you are on statistical or computer techniques. A quotation that I like most is:
“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used. If you have many independent features that each correlate well with the class, learning is easy. On the other hand, if the class is a very complex function of the features, you may not be able to learn it. Often, the raw data is not in a form that is amenable to learning, but you can construct features from it that are. This is typically where most of the effort in a machine learning project goes.”
This is from A Few Useful Things to Know about Machine Learning, paper from Pedro Domingos. This quote has some many important elements. Let’s focus on two of them: “if the class is a very complex function of the features” and “ this is typically where most of the effort in a machine learning project goes”.
Good features, less complex models
The better are your features the less complex are the models that you need to build, also being faster to run, easier to understand and easier to maintain. That’s the golden goose! You need to have the best representation (as possible) of the sample data to learn a solution to your problem. So, a good feature represent the most of a pattern recognition, where a feature is an individual measurable property of a phenomenon being observed (Bishop, Christopher – Pattern recognition and machine learning 2006).
It’s deep. But there is a simple topics flow to guide you in order to start with feature engineering: data understanding, univariate and bivariate analysis, missing values and outliers treatment. In the end of the day, this is a summary of the phase 3 of CRISP – DM: Select data, Clean data, Construct data and Attribute selection. Data preparation tasks are likely to be performed multiple times…it leads us to the next topic.
Where the most of the effort of a machine learning project goes
Experience data modellers says that feature engineering could take almost 70% of the time in a data mining project. By now you are capable to understand why. At this point, another problem emerges at this phase: the “Feature explosion”. And now we can talk about Feature Selection, as mentioned in the beginning of this article.
Also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. It is important because will help you in simplification of models, shorter training times, and overfitting reduction. A great start point on this subject is the paper An Introduction to Variable and Feature Selection, from Isabelle Guyon and Andre Elisseeff.
This introduction to Feature Engineering will help you to better understand the importance of the data preparation phase and to increase the success rate of your predictive models. It is not a trivial topic, but is one of the key factors to success in the machine learning area.