In recent years, we have seen the Artificial Intelligence field of study appear on several news programs on TV, Radio and Internet. Words like Big Data, Data Science, Machine Learning and Deep Learning are quickly incorporated into the vocabulary of the business world.
Bearing in mind that many companies would like to apply these technologies in their businesses, I selected 3 tips for manages and executives getting start to apply Machine Learning in their companies and boost their chances of success.
Tip 1: Differentiate a false expectation from a reality.
The truth is that to apply Machine Learning successfully it is necessary not only to have technical knowledge on how to extract useful patterns from the data, but mainly how to formulate a good business problem to be solved, in addition to creating a culture in the company of maintaining a complete cycle of care with data, ranging from the correct selection and collection, to the availability of results with a focus on real value for the company’s strategy, to be measured, generally, by the satisfaction of its managers and customers.
For this to happen, it usually takes three critical skills:
- Deeply understand the company’s business model and products / services;
- Data analysis techniques and pattern recognition algorithms;
- Knowledge in Information Technology.
Note that these items are difficult to find in one person, as they are different roles, requiring the culture of teamwork. There is little point in having the best data scientist on the market, if there is not a strong synergy of the company to align business managers with the scientist and also provide a minimum of technical IT infrastructure to make the project feasible, in addition to the appetite for taking risks.
Tip 2: Garbage In, Garbage Out!
The quality of the Machine Learning solution is directly related to the quality of the data. As the results are based on a supposed learning about them, this can only happen when a solution (model, algorithm …) is able to faithfully generalize the reality of a business. The most exciting part of working with Machine Learning is, without a doubt, the part of creating models, executing algorithms, and showing results, but it all depends on the quality and relevance of the data that served as input for these tasks. That is, do a good job of base.
In fact, most of the time in a Machine Learning project is spent on organizing, transforming and cleaning data. Some items to be taken care of:
- Is the data really relevant to solving the company’s problem?
- For example, data on income in a product recommendation solution;
- Where and how will the data come from? Is there a guarantee of availability and updating as the solution requires?
- Example: data in real time, daily, weekly, monthly, retrieved automatically or extracted manually;
- Does the data contain many blank or null fields?
- Apply the mean / median / mode of the values or delete the records with missing data?
- Are the values entered reliable?
- Example: at points of sale, the client ID fields are mostly a standard number such as 0000001.
- Is the data in the required format for solution and is it up to date?
- Example: we have the age of a person, but it refers to the time when he registered at the store.
It doesn’t matter if the data is structured like Excel spreadsheets or database tables, or if it is unstructured like images, videos and audio; or whether they are obtained, automatically, from the internet or extracted manually. What is important is that they are relevant to the solution of the business problem (generally increasing revenue or reducing expenses) and that they are clean and transformed in an optimal way for the problem, because if garbage comes in, garbage will certainly come out, also. There is no recycling.
Tip 3: Learn to deal with the uncertainties of this type of project
When we work with software development projects, we are used to a kind of reality. There are features, in the form of requirements to be developed, a well-defined schedule, cost and scope, most of the time. In the case of problems, adjustments such as increasing the staff, increasing the budget, purchasing better resources, overtime and prioritizing the scope can effectively solve the problem.
When starting a Machine Learning project, it is not possible to promise a perfectly accurate result, in the same way that we tend to do in software development projects. Imagine a project to recognize wild animals through images taken from a farm camera. When an animal approaches the camera, a photo is created and the Machine Learning software classifies the animal as a chicken, cattle or wolf, for example. Depending on the type of animal that is detected by the software, a different action must be taken, such as counting chickens or sounding an alarm when detecting a wolf. An initial question could be: what is the percentage of correctness in the classification of animals?
Will it be 40%, 75% or 90%? How many images are needed to have good accuracy? One hundred? Five hundred? Thousand? Ten thousand? Two hundred of each animal? When new types of animals appear, how to include them? What is the impact of this inclusion on the new results? Will there be many animals identified incorrectly? What is the impact of incorrectly detecting a wolf?
Before starting the Machine Learning Project, we must always clearly define the success indicators, such as achieving 80% accuracy, and only tolerating 10% error or something. However, it is not possible to say with certainty that a certain standard of success will be reached. Only, we can, through experience, estimate the results. And in some cases, if the technical team is not experienced and does not make the risks clear, depending on the results, frustrations can be generated in project managers and sponsors. In innovation, risk and uncertainty are the factors that define this type of project.