Summary: As we have become ever more enamored with DNNs, and their accuracy and utility has been paced only by their complexity we will need to answer the question of whether we will ever really be able to explain what goes on inside.
As we have become ever more enamored with DNNs, and their accuracy and utility has been paced only by their complexity we will need to answer the question of whether we will ever really be able to explain what goes on inside.
“Transparency” is the capper term used for this explainability. It has been the rule rather than the exception in all sorts of ML modeling particularly in finance, lending, and insurance. The public, driven by the plaintiff’s bar caught on early that there was the potential, not necessarily the actuality, that an ML model might disadvantage some more than others.
So even though more accurate DNN models might more accurately predict outcomes like credit worthiness or best risk adjusted interest rates, that economic efficiency was regulated as potentially unjust and the use of modeling techniques like DNNs and some specific types of data were proscribed.
Those heavily regulated industries have learned to adapt. For example modeling may still take place in DNNs but then is ‘distilled’ by using the DNN output to train explainable techniques like trees. For those relatively straightforward classification models based mostly on structured and some semi-structured data this has come down to a sort of economic stalemate, a cost of doing business.
But increasingly the high value applications of AI/ML fall in the realm of classification or regression based almost solely on unstructured data, image, video, text, speech, and analog or digital signals that have no human understandable correlation. Here our CNN, RNN, LSTM, BERT and other DNN structures are the only reliable tools. And wherever there is the risk of inequitable treatment or damage to a human’s life or well-being the demand for transparency will emerge.
The field of deep learning explainability is becoming a research focus in its own right. Hundreds of academic papers have been written around the topic of how to determine if actions taken on the basis of DNN decision support systems or automated actions are legally and ethically defensible.
Recently, researchers Ning Xie, Gabrielle Ras, Marcel van Gerven, Derek Doran attempted to provide an overall structure to this field and have published in arXiv the “Explainable Deep Learning: A Field Guide for the Uninitiated”. Much of the material here is drawn from their paper along with my own observations.
Is Transparency of DNN Models Really Necessary?
Is there really risk of damage to humans from DNNs? If you’re using CNNs to spot cosmetic defects in sheet metal on an assembly line, or RNNs to caption photos or translate simple text and speech, probably very little.
Facial recognition and its sister body and activity recognition however are already in the cross hairs. If police use either to identify potential criminals or criminal acts then errors can have consequences.
Reinforcement learning and computer vision or augmented control systems used in autonomous vehicles have obvious risk.
Military personnel taking action based on perceived risk identified by DNNs may well be held accountable for their actions and the same applies to medical personnel who act on DNN-guided decision support systems which, like all models, will have some false positives and false negatives.
Any time you must ask the question, ‘who will we hold accountable?’ you have identified a risk area where transparency must ultimately be addressed.
What Type of Transparency Do We Want?
We would wish that all our application of DNNs would be decision support systems that imply the system makes a recommendation but a human must decide to take the action. Models that could explain themselves in real time might be the Holy Grail but mostly we must be satisfied by some sort of ongoing testing or evaluation where humans continuously judge the recommendations to be correct.
This is particularly true in control systems where the reaction time is too short for human intervention. Autonomous vehicles and even some types of medical intervention function in this space.
Xie et al suggest that the traits of a satisfactory explanation would include:
Confidence: This may be a statistical measure but is more broadly understood that a human decision maker could observe that the system is making the decision with the same set of inputs and the same outputs that they themselves might make.
Trust: The model’s performance on test data should approximate its performance in practice. Or, the user does not actually need to validate the model if its long term performance matches their own experience and expectation. Here is where we encounter issues of bias in training data and/or performance drift over time as the training data no longer matches physical reality.
Safety: Perhaps a subset of trust, but one that focuses on features of the model that causes it to bias its responses in favor of doing no harm. Medical models particularly are an example in this area.
Ethics: The feature of explainability most in doubt may be whether the model adheres to common moral and ethical guides. The complication arises because cultural differences vary widely and behaviors accepted as ethical in one community may not be so in another. An example here are the models used to guide judges in sentencing convicted felons based on their perceived risk and likelihood to recommit the crime.
What Level of Explainability Can We Actually Achieve?
There are several techniques currently in use that can explain the output of DNN models or can be used to test for potential bias and harm.
Distillation is broadly defined as a group of procedures used to create external validation of the output as reasonable. These are ‘white box’ ML procedures that are inherently explainable like decision trees applied to the input and output of the DNN model to mimic the behavior of the DNN model.
Saliency or Input Impact modeling uses different techniques to evaluate the degree to which each input variable impacts output. Backprop methods evaluate feature relevance based on the volume of the gradient passed through network layers during training. Perturbation methods, similar to one-left-out evaluate the impact on accuracy of each variable and assign a score.
It’s also import to evaluate with CNNs whether the portions of the image that are impacting the convolution are in fact the portion of the image you intended to classify. There are a variety of techniques in this area starting with LIME (Local Interpretable Model-Agnostic Explanations) designed to ensure that it’s not some artifact of the background unrelated to the target objects that’s being ‘seen’.
Finally, in a long list of technical approaches, it’s also possible to test for adversarial attack. Early on we discovered that CNNs could easily be misled by the insertion of small amounts of random noise. Now more sophisticated attacks are possible with generative adversarial nets (GANS) spoofing models with false realistic data.
The reality is however that with every hidden layer, the mathematical vectors become averaged and summarized in a way that makes DNNs fundamentally unexplainable. If we want to continue to use DNNs, and we certainly do, all these explainability techniques require the addition of significant extra human labor. This imposes a cost in terms of resources and time that is not always available. The ideal of a general purpose explainability procedure that can be applied broadly across many DNNs is a goal that is not yet close to being achieved.
Other articles by Bill Vorhies
About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2.1 million times.