Solving the Data Science Mystery
Data Science has become an inevitable charter in our everyday lives where every action of ours is measured, plotted, classified and logged. We leave traces of who we are while diving a car, when visiting a place, after watching a movie or shopping what we want. These traces of data captured from assorted sources has led to the greatest age of innovation where Data Science is enabling us to reveal the deepest of secrets hidden within the data. Have we ever thought of making self-discoveries that would forever change the way we live and behave the way we are! Data Science is the acting catalyst behind our current evolution and it is growing at a petrifying pace.
Businesses have realized that they should adopt & embrace these changes now or risk being left behind. Data Monetization is the new paradigm for Organizations and slowly but steadily Data is becoming their currency of trade. Through Data Science, Organizations have figured out ways to influence Customer buying behaviors, analyze buying patterns, target anonymous or probable’s, track Customer social activity and unrest, predict risk and many more. It is also equally important for them to realize the cumulative power, biases & the assumptions that are derived out of applying Data Science methodologies will have a profound impact on the business, security & the daily lives of everyone involved. It is the responsibility of Data Science practitioners, leaders & visionaries to guide us on this journey as deep-seated & radical ways of consuming data is changing the face of the world.
It feels like solving a mystery case!
Growing up on mystery shows, movies, books of Dashiell Hammett and Sir Arthur Conan Doyle instilled a great love for solving mysteries in me and naturally I drew parallel in solving the Data Science mysteries. Is it really a mystery! To most of us, yes!
Data Science is more like an art of turning data into actionable insights. We have been exposed to multitude of Data Science Applications in our day to day endeavors. Though we consume it regularly, we never cared to look behind the scenes on the rigorous processes, data preparation and machine learning algorithms that give us accurate data to devour. Weather Forecasting, Stock Market Predictions, Targeted Advertisements, Books, Music & Movie Recommendations, Health Prognosis etc., are all mixed up in our busy schedules and helping us take decisions on the fly.
Practicing Data Science is also an addiction as you need, new and exciting challenges to strive. Ask a true Data Scientist and they will confess how powerful they feel with access to the abundance of data and the secrets that come with it.
Behind the scenes of a Data Science Application
The tuning of Data into actionable insights is achieved through creation of Data Science Products or Applications (Apps). These applications provides business users and decision makers with actionable metrics without exposing the underlying data or algorithms. Behind every Data App there is a demanding process of extracting timely data from multiple data sources, perform significance & quality checks, exploratory analysis to identify patterns & developing machine learning or deep learning algorithms that drives the recommendation and automation of Data Apps.
Building a Data Science App is a complex task. You need the right set of data, tools, techniques, talent and the mindset to pursue a working model. The cycle involves addressing the five main activities:
- Data Acquisition
- Data Preparation
- Exploratory Analysis
- Build Analytical Models
- Act & Advise on the Output
Most of these activities involve tasks that are highly iterative in nature and will end up where it all started. The underlying principle is to fail often but learn faster!
Unraveling the mysteries through Data Apps
You would have realized by now, all of these Data Science Apps arise out of a defined business problem or a need. Consider the below sample list for any product or service organization:
- Who are the customers most likely to switch to a competitive brand in the next 2-3 months?
- Which of my products or services should be advertised more heavily to increase sales & profitability?
- What is the margin of promotion to get the best possible sales & profit?
- What manufacturing process-change will allow the organization to build a better quality product at a lower cost?
- When a customer is most likely to make the next purchase and what could be the range of cumulative $ spend for the year?
The key to answering these questions is to understand the underlying data & what the data precisely communicates. Another critical component for building a Data Science Application is to have the right set of minds with diverse skill sets, working together to achieve the same goal. First, the environment & computer science skills necessary to manipulate data, process data & test data-driven hypothesis. Second, rich experience in statistics, calculus & algebra to convert the business problem into a mathematical structure and solution. Finally, the domain expertise that helps deciphering the actual problem statements and map it to the data. All of the three components in a right mix contributes to the accuracy and richness of the application.
All seasoned data scientists would have encountered similar business cases and problem statements. The fact is, no model or application is good enough on the first attempt. It evolves over a period of time with multiple attempts of fine-tuning, error handling, variable mapping, choosing predictors & recording interactions. Every model you run, tells you a story. Stop in between and listen to it whether it is good or bad.
In an attempt to solve the mystery, all of us have comprehended the universe of analytic techniques and their fitment to specific business problems. We need to find our way through the maze by understanding the data and transformations required. Then identify the right data discovery technique required to figure out the relationships (Hypothesis Testing, Regression or Clustering), nominate the best predictive technique to derive an expected outcome & list the best course of action for optimization & automation. All of these guidelines act like a process flow to determine the analytical techniques involved in solving the problem statement.
The Final Pass: Feature Engineering & Deep Learning
Machine Learning has always been the preserve of Data Science applications. These applications can now teach themselves to grow and change when exposed to new data or environment. This resurging interest in machine learning is due to the growing volumes and varieties of available data, cheaper computational processing and affordable data storage. It is now possible to quickly and automatically produce Data Science Applications that can analyze bigger, more complex data and deliver faster, more accurate results on a very large scale.
To top this, many practical applications of Machine Learning are enabled by Deep Learning that extends the overall field of Artificial Intelligence. Deep Learning breaks down tasks in ways that makes all kinds of machine assists seem likely. At present, deep learning has moved beyond academic applications and is finding its way into our daily lives. Everything we discussed – Driverless cars, better preventive healthcare, even better movie recommendations, are all here today and will only improve given the rapid rate of advancement. Deep learning avoids the necessity of human-coded features and instead incorporates the feature engineering, feature selection, and model fitting into one step. Feature engineering & selection are fundamental to any application of Deep Learning that you can think off. It is the process of using domain knowledge of the data to create features that make Machine/Deep learning algorithms work. Without feature engineering, it would not be possible to understand and represent the business problem through a mathematical model. Feature engineering is an art that is influenced significantly by the Data Scientist’s capabilities, perceptions and understanding of the domain. Feature selection meanwhile is the process of determining the set of features with the highest information value to the model. Both of these immensely contribute to building Cognitive Deep Learning Applications.
Closing thoughts
Data Science is evolving at a rapid pace, touching every aspect of our lives and impacting every decision we make. This trend will continue and will have an intense effect on how organizations drive growth. We can anticipate great advancement in this field where cognitive computing & artificial intelligence will enable self-sufficient algorithms that exploit the content, context, and semantic meaning of the data to reveal the right question that you should be asking of your data.
The mysteries that we spoke about will only grow faster with the advancement in the growth of data type like unstructured text data & data emitted out of IOT devices/sensors. All of these technical advancements sit on top of the hottest, most critical domains that requires bigger, faster, and more complex data sets and super-fast real-time analytics.
I hope that, I have helped in some way to unearth the true potential of data. All of these mysteries will only lead us to become extraordinary thinkers who ask the right questions. I will continue to drive forward the discipline and art of Data Science with more articles specifically focusing on how to tackle analytical use cases and the taxonomy of solving business problems.