This article was written by David March.
Data science has been around since mankind first did experiments and recorded data. It is only since the advent of big and heterogenous data that the term “Data Science” was coined. With such a long and varied history, the field should benefit from the great diversity of perspectives that are brought by practitioners from different fields. My own path started with signal analysis. I was building high speed interferometric photon counting systems, where my “data science” was dominated by signal to noise and information encoding. The key aspect of this was that data science was applied to extend or modify our knowledge (understanding) of the physical system. Later my data science efforts focused on stochastic dynamical systems. While the techniques and tools employed were different than those used in signal analysis, the objective remained the same, to extend or modify our knowledge of a system.
Today, what is popularly referred to as Data Science, is a 180-degree shift from my experience. Rather than extending or modifying the knowledge of a system, Data Science is used to infer a potential system or family of potential systems directly from the data. The difference is often described as exploratory vs confirmatory but I prefer to use the terms learning vs knowledge. The key epistemological difference is that knowledge is the process of modifying or enhancing understanding, while learning is the process of acquiring new or modifying behavior or preferences. There is a natural order to this. Learning must proceed knowledge. Deep Learning can lead to Deep Knowledge. Unfortunately, without general AI, this step is neither straightforward nor can be accomplished solely with the current tools of data science. The lack of deep knowledge may place the firm at considerable risk. Not understanding the details of the underlying system severely limits the firm’s ability to predict how the inferred system will react to real world changes or how to respond to changes once they occur. Specially the “learned system” may only be valid for a limited set of real-world conditions. Some of the important factors may not be identified because at the time of sampling, they are constrained by external market forces making them appear as unimportant. The scary reality is that the underlying system is likely to be non-linear with feedback loops and when a constrained factor is released, the model might simply blow up.
To illustrate, I will use a simple thought experiment. Image the domain of customer satisfaction. Applying current machine learning techniques, the domain can be best represented by 3 clusters. We assume that all customers within a cluster are the same because they currently occupy the same area in the domain. Notice the word currently in the previous sentence. This is important because at the time of data sampling, the individuals were confined to the cluster by some unidentified market force. We do not know (knowledge, understanding) how they are confined. While cluster cohesion will naturally degrade over time, I am primarily interested in the immediately impact to the cluster membership when the market force constraint is removed. We need deeper knowledge to better understand the dynamics of the individual and how they are affected by changes in previously constrained market forces in order to anticipate and exploit these abrupt changes.
Assume an individual is represented by a multidimensional utility function that maps to the customer satisfaction domain. This function contains non-linear features and numerous feedback loops which may be negative, positive or either depending on market conditions. To illustrate, let’s consider hyperbolic discounting, a well-established non-linear feature from behavior economics. As an exponential, small changes in the market interest rate can cause large changes in value perception. Each individual will have a different response ranging from almost none to dramatic changes in consumption and investment behavior. A change in interest rates could dramatically alter the cluster membership. During normal macroeconomic circumstances, hyperbolic discounting would be a principal component and widely different response functions would not be contained within the same cluster. However, what are normal circumstances? The financial crisis resulted in extremely low and stable interest rates for a statistically long period of time compared to our sampling. Without interest rate change and volatility, hyperbolic discounting would appear stable and have little or no impact and be ignored by machine learning. A vitally important feature is not included in the learning algorithm because the algorithm cannot identify it as important. Since interest rates and hyperbolic discounting is not contained in the model, it will be very difficult for the data scientist to understand what occurred or how to tweak the model when market rates change.
The best way forward towards developing Deep Knowledge is to first consider the ML model as an artifact of emergent behavior emanating from a large number of interacting customers. This means approaching the understanding of the ML model from the opposite direction, the individual customer. This perspective is the domain of Agent Based Modeling. The strategy is to iteratively manipulate the parameters and equations that govern agent behavior until we are able to generate the emergent behavior that creates the same ML patterns. The large number of features and the multiplicity of governing equations will result in a potentially large number of agent configurations that generate the same ML artifact. Analyzing these configurations will provide great insight and understanding of the dynamic nature of the ML model, however, it is imperative to identify the configuration closest to reality. I believe the best discriminator for choosing among the alternatives is maximum entropy. Once the best Agent Based Model is identified, sensitivity analysis will provide tremendous insight to the potential effects of changes in market forces. These insights will give the organization a considerable competitive advantage.
To read the whole article, with illustrations, click here.