AI is increasingly used to make important decisions in our lives like predicting the onset of a disease, the suitability of a job candidate, the likelihood of a criminal to reoffend, or the next president of the United States. In such cases, having AI model outcomes that are explainable, or interpretable, is often just as important as having model outcomes that are accurate. In this article, we explore model interpretability and discuss why you should care.

We highlight a generalizable state-of-the-art approach, known as SHAP, and illustrate how it works behind the scenes to realize explainable AI models. Let's start by asking: what exactly is explainable AI and why is it important?

Explainable AI refers to the ability to interpret model outcomes in such a way that is easily understood by human beings.

When creating an AI model, the accuracy of the model is assessed on test (holdout) data and test accuracy is seen as a proxy for how the model will perform in a production context. While model accuracy is important, it is often equally important to understand why a model has made its decisions. For example, consider the following scenarios, where having explainable AI would be helpful or necessary:

healthcare: AI is used to to predict a cancer diagnosis but a doctor is unlikely to make an intervention without a full understanding of the contributing factors that led to the decision.
aviation: A pilot is asked to perform an emergency maneuver based on an AI decision but they need to understand the basis of that decision before taking action that may jeopardize the safety of the crew and passengers. For example, an explainable model shows that a decision to descend quickly is based on a perceived sudden drop in cabin pressure. However, the pilot concludes that no such cabin pressure drop actually occurred and that the perceived decrease is due to a faulty sensor. Hence, the pilot sensibly chooses not to expedite the suggested maneuver.
genetics: An AI model is trained to predict which DNA mutations will cause a disease. If the model performs well, it means the model has discovered patterns that we humans would like to understand. In this example, without having an interpretable model, scientists have little chance in explaining the underlying patterns that the model has found.
team management: Moneyball tactics are implemented by a baseball team where, in part, AI is used to forecast a pitcher's future performance based on current game statistics. Controversially, the AI model suggests the removal of a highly successful pitcher during the World Series, much to the disappointment of the pitcher, fans and commentators. Given the opportunity to understand the basis of the prediction, the manager may have made a different decision, possibly leading to a different outcome.
real estate: A realtor's AI model is used to predict home values but a particular homeowner is upset with the realtor because they feel the value is far too low and likely to result in them losing money in an upcoming sale. Another unhappy customer feels that their estimated value is far too high, potentially hurting their appeal to lower their property taxes. In either case, the realtor is on the hook to defend the decision of the model and understanding the model features that contributed to that decision would be very helpful in communicating with the unhappy customers.
fraud detection: A customer's transaction is flagged as potentially fraudulent but the confidence in the prediction is low, say at 51%. A monitoring agent will need to understand the reasons behind the suspected fraud in order to take appropriate action. For example, it could be that the model has honed in on the fact that the purchase being made is highly inconsistent with prior transactions but it is observed that the purchase was made far from home. The monitoring agent concludes that the customer is traveling and takes appropriate steps to contact the customer to confirm the transaction prior to freezing their account.
finance: A highly accurate AI model is used to assess the risk of lending money to banking customers and predicts that a particular loan applicant has a 65% chance of missing multiple payments, which likely will result in their application being denied. The lender will need to break the bad news and explain to the customer why their loan application is slated to be denied. Telling them, "Trust me, my AI model is super accurate!" is not helpful and unlikely to be well received. It would be better to tell them something along the lines of "It looks like your high debt to income ratio of 3.5 and having recently missed two payments in 2019Q1 has increased the probability that you will miss future payments". That is useful information and allows the customer the opportunity to explain circumstances that may potentially alter the final lending decision.

Being able to explain AI model outcomes also provides the opportunity to assess the fairness, equity, and legality of those decisions.

For example, let's revisit the finance scenario above and say that the applicant's age turned out to have a strong negative impact on the model's predicted outcome. By law, banks cannot discriminate on the basis of age and so the lender must take that into consideration in making a final decision. Otherwise, the bank may find themselves in legal hot water. Such insight would prompt the removal of age-related bias in the model and possibly for the model to be audited to identify other sources of societal bias. For more information, see our article: How to detect and mitigate harmful societal bias in your organization's AI. Societal bias in AI is a rising concern and many tech companies, such as Google, are developing frameworks that facilitate auditable AI. Having model outcomes that are easily explainable supplements the audit process by providing transparency in the training data and the functionality of the model.

Given the importance of AI transparency, why has it taken so long to come to fruition? After all, AI research was founded as an academic discipline in 1956 and here we are in 2020 seeking explanation of AI model output. The primary reason has to do with the busyness of the underlying internal circuitry of AI models, that is,

the complexity of a typical AI model makes it difficult to interpret predicted outcomes.

This is because the input-output pathway for making a decision becomes more circuitous as the model complexity increases. It is common for deep learning models to contain thousands of interconnected neurons with millions of parameters defining the strength of those connections. Ensemble models, where multiple individual models are trained and their outputs aggregated to form decisions, add yet another layer of complexity. For example, while decision trees are very easy to explain they are notoriously prone to overfitting. Tree-based ensemble algorithms, such as random forest, offer more stability and generalizability but at the cost of explainability as typically hundreds of trees of varying depth are grown. Such model complexity is often described colloquially as a "black box", where: data goes in, magic happens, and decisions are made.

What we need are tools to help shine a light inside the proverbial AI "black box" with the ability to not just understand feature importance at the population level, but to actually quantify feature importance on a per-outcome basis.

In recent years, progress has been made in the ability to explain complex AI model predictions, highlighted by the following approaches:

LIME: interprets individual model predictions based on local linear approximations to the model around a given prediction. LIME is model-agnostic in the sense that it can be applied to any machine learning model.
DeepLIFT: recursive prediction explanation method for deep learning models, which learns important features by monitoring propagating activation differences in deep learning networks.
SHAP: a unified, model-agnostic approach for quantifying the impact of features contributing to a model decision. This approach leads to SHAP values, which interpret the impact of having a certain value for a given feature in comparison to the prediction a model would make if that feature took on some baseline value.

It is outside the technical scope of this article to delve into each of these methodologies but we briefly explore the SHAP approach, which has gained considerable attention in AI-related literature and media, to gain a sense of how such algorithms work.

Screen Shot 2020-11-10 at 4.19.37 PM.png

SHAP values: one approach to illuminating the AI black box

The SHAP package was developed by Lundberg et al. SHAP stands for SHapley Additive exPlanations and represents a game theoretic approach to interpret the output of any machine learning model. SHAP builds on the work of Lloyd Shapley, an American mathematician and economist who won the 2012 Nobel prize in economics for his research in cooperative game theory. Shapley was interested in solving the problem of fair credit allocation in a cooperative game, i.e., given a coalition of players who cooperate to achieve an overall gain, how do we fairly assess the contribution of each player and reward them commensurately under the assumption that each player's contribution is different? His work led to a methodology for calculating a player's fair share of the overall gain, a term that is commonly referred to as the Shapley value. The SHAP package extends the economics-inspired work of Shapley by recasting it into a machine learning context:

Player → a feature in our data, e.g., square_footage or credit_score
Coalition → the set of features used for training a model
Game → a single predicted outcome of a trained model
Shapley value → SHAP value, representing the average marginal contribution of a feature relative to a specified baseline.

Figure 1: Diagrammatic representation of the SHAP values. Consider a model f trained on four features x = {x1, x2, x3, x4}. The SHAP values ø = {ø1,ø2,ø3,ø4} quantify the contribution of each feature to a predicted outcome, f(x). Pictorially, the SH… — **Figure 1**: Diagrammatic representation of the SHAP values. Consider a model f trained on four features x = {x1, x2, x3, x4}. The SHAP values ø = {ø1,ø2,ø3,ø4} quantify the contribution of each feature to a predicted outcome, f(x). Pictorially, the SHAP values are represented by the arrows in the diagram, where the red arrows indicate a positive contribution and the blue arrows a negative contribution. The length of an arrow is proportional to the magnitude of the corresponding SHAP value. The SHAP values are designed to be additive and explain the *marginal contribution* of each feature relative to features that have already been assessed. Generally, the SHAP values are sensitive to the order in which the features are presented and the diagram only shows one such sequence, namely x1, x2, x3, x4. To calculate the global SHAP values, an average is taken over all feature order permutations. See text for more details via a working example.

To solidify the concept of SHAP values, let's consider a specific example related to the diagram shown in Fig. 1. Let's say we train a model f to predict the likelihood of a person being able to complete a marathon using features: age, does_not_smoke, VO2_max, and likes_running. Our model predicts that Sarah (age=24, does_not_smoke=True, VO2_max=42, and likesrunning=False) has a 25% chance of finishing a marathon and she would like to know how we arrived at our conclusion. If we knew nothing about Sarah, we might assume that she is a typical person in our training data and therefore a reasonable prediction would be the expected value of our model, e.g., the mean predicted value in our training data, which is say 10%: ø₀ = +10%. The goal of the SHAP values is to map how each of Sarah's features individually contribute to the difference of her prediction of 25% and the baseline of 10%. In other words, how do Sarah's features explain the 15% greater predicted chance that she will cross the finish line over the average person? We might start by introducing the knowledge that Sarah is 24 years old and then reassess the expected value of the model conditioned by that knowledge, i.e., E[f(x) | age=24]. In doing so, we find Sarah's age to have a positive contribution relative to the mean, bumping up her prediction of marathon completion to 18%. The corresponding SHAP value for age=24 is ø₁ = +8%. We then introduce the knowledge that Sarah has a VO2_max=42 and find that additional knowledge increases her predicted value to 30%, equating to a SHAP value of ø₂ = +12%. We continue by introducing the fact that Sarah doesn't smoke, which brings her prediction to 50%, corresponding to a SHAP value of ø₃ = +20%. But then we introduce the fact that Sarah actually does not like to run, which greatly reduces her chances of finishing a marathon and brings the prediction back down to 25%, corresponding to a SHAP value of ø₄ = -25%.

Generally, in calculating the SHAP values we first establish a baseline and then quantify the impact of different features in moving away from that baseline and arriving at a final decision. The order in which we assess the features is important and a change in the order of the features examined will (likely) affect the magnitude and sign of the SHAP values. Therefore, an average over all such permutations is taken. However, given that the number of permutations is exponential with the number of features, approximations are used to estimate the mean SHAP values for a given outcome. The details of how SHAP values are calculated in practice is outside the scope of this article but for those that are interested we encourage you to read the original paper.

Summary

AI model architectures are often highly complex and consequently their predicted outcomes are difficult to explain because the paths from input to output are complicated and highly dependent on the magnitudes of the inputs. However, recent advancements have been made in understanding complex AI models by quantifying the influence of features on a per-outcome basis. Some approaches are model agnostic in the sense that they may be applied to any machine learning model; this is very important as it is quite common for data scientists to regularly consider model variants prior to release. We now have at our disposal several techniques that help shine a light on the proverbial AI black box, where we can explain why a model has made a given decision. In most areas including healthcare, aviation, and finance, auditable and thus explainable AI is a must. Explainable AI also allows us to assess the fairness of model decisions and may help reveal harmful societal bias baked into your models.