How to Detect and Mitigate Harmful Societal Bias in Your Organization’s AI

Screen Shot 2020-10-16 at 4.57.15 PM.png

Encouraged by AI’s success in business and science over the last decade, companies are now using AI models to make life altering decisions in other areas such as healthcare, hiring, lending, university admissions and criminal justice.

The stakes of AI models to make decisions fairly could never be higher and yet there are many high profile examples where AI has been shown to produce unfair, inequitable and exclusive decisions.

Such cases illustrate that AI has the potential to have a profoundly negative impact on society and business, particularly when the decisions your models make are discriminatory. In this article, we explore how discriminatory societal bias can infiltrate AI models and what your company can do to detect, monitor, and mitigate the problem.

Societal bias in AI models

Artificial Intelligence is regularly used to make decisions more quickly, accurately and cost effectively than humans by discovering patterns in data that may be difficult to spot, particularly when data volumes are large. A shining example of this is in IBM's Watson system, which was shown to do significantly better at detecting and diagnosing certain cancers than human doctors. Watson did so by ingesting more than 600,000 pieces of medical evidence, two million pages in medical journal articles, and 1.5 million patient records: a feat that a highly trained specialist is unlikely to achieve in a reasonable time frame. Such examples serve to embolden smaller companies to use AI to tackle tough and transformative business problems and we at Xyonix are here to help your company accelerate your use of AI. However, our decades of experience has taught us some harsh lessons along the way. For example, AI should not be seen as a "blackbox" tool where you can feed in vast quantities of data and expect goodness to come out the other end. Even with lots of AI experience, researchers and companies have produced nightmarish scenarios of "AI gone wrong", including:

  • rampant racism in software used by US hospitals to allocate healthcare for 200 million patients each year

  • bias against women in Amazon's initial attempt at building a resume filtering tool

  • misogynistic, racist, and Nazi supportive Twitterbot developed by Microsoft

  • racism in COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) algorithm used in US court systems to predict the likelihood that a defendant would become a recidivist.

  • gender bias in standard natural language processing (NLP) models

  • gender and race bias in face detection

  • racial bias in face recognition algorithms, where Asian and African American people were up to 100 times more likely to be misidentified than white men

In these examples, there was undoubtedly no premeditated intent to develop an evil robot. Instead, highly scrutinized models that performed well overall were found to be non-performant within certain demographics. The damage to a company's reputation in these situations is difficult to assess but certainly the impact is negative. In many of these cases, and to the credit of the companies that produced them, the offending application was quickly pulled down and a public apology issued. The moral of the story is clear:

Negative societal bias can work its way into AI models and, if left unchecked, can cause considerable harm or insult by promoting offensive stereotypes, lessening equality, and potentially placing people's reputation, health and safety at risk. 

Successful models you have in place today that target a particular demographic may generalize poorly to future populations of interest. Worse than that, the generalization to other groups may reveal that the models that have historically served you well are inherently racist, sexist, ageist, or downright illegal. The fiasco examples cited above serve as a reminder to take seriously the potential and subsequent implications for building a model that contains harmful societal bias. Many of those examples were built by experts in the field from high-powered tech companies who have been building AI systems for a long time. The point being: if it can happen to them, it most certainly can happen to you

How can you prevent harmful AI?

It is important to recognize that a well performing model will likely contain biases because, in a certain sense, these biases are at the heart of machine learning. The trouble starts when those biases are unwanted, such as biases that are offensive or discriminatory in nature. There are many reasons why societally harmful and offensive AI systems get built. Fortunately, there are clear steps you can take to prevent this from happening in your company: 

  1. identify relevant sources of bias

  2. maintain provenance and record potential problems

  3. integrate fairness metrics and mitigation strategies into your ML pipeline

  4. consider alternatives when using pre-trained models

shutterstock_1739367434.jpg

In what follows, we explore each of these in-depth to give you a clear understanding of the importance of each step.

Identify relevant sources of bias

There are many sources of bias in machine learning.  Here are some types of bias that you should pay particular attention to when building your models as they have the potential of promoting unwanted discriminative bias:

  • sample bias - occurs when one population is over- or under-represented in the training data and therefore does not accurately represent the distribution of data in the environment that the model will operate, e.g., a facial recognition model that is trained mostly on white people and results in a model that is very inaccurate in identifying faces of color.

  • prejudice bias - occurs when negative cultural or other stereotypes are (unknowingly) applied by human annotators, who label data for subsequent use in training classification models. It is preferable that annotators come from a diverse set of locations, ethnicities, languages, ages, and genders, so that one person's views or opinions does not dominate the resulting label set. This is particularly true when the labels are naturally subjective, e.g., how beautiful, trustworthy, intelligent, or nice someone is based on their appearance in a photo. Otherwise, implicit personal biases may lead to labels that are skewed in ways that yield systematic disadvantages to certain groups.

  • outcome proxy bias - occurs when proxy features are inappropriately used to train a model to achieve a desired outcome. For example, consider a model whose desired outcome is to be able to accurately predict the likelihood of a person committing a crime. Using localized arrest rates as a proxy is biased because arrest rates are greater in neighborhoods with a higher police presence and, from a legal perspective, being arrested does not imply guilt. 

Maintain provenance and record potential problems

With every data set you collect to help train your model, consider storing provenance information that describes the source of the data. In addition, record characterizations of bias in the data including both known and understood bias in addition to potential areas of bias that might require future investigation.

For example, consider the case where someone is building a model to predict future time, location and severity of crimes based on past criminal behavior. In a first pass, a data scientist might use crime type, time and location information. Later, the data scientist might wish to incorporate another promising data source -- for example, they might find that including demographic information like the criminal’s race, age and gender increases the algorithm’s efficacy. Now, if the organization is committed to recording key provenance information, they might find that in the first case, the crime type, time and location information was only provided for high crime areas in large industrial cities. A diligent data scientist should flag potential concerns for application of any model based on this limited data outside of similar regions, as racial bias might be introduced. In the latter case, with demographic information explicitly introduced, the risk becomes markedly elevated.

The simple act of recording societal bias information may seem insufficient but, at a minimum, it creates a record trail that reminds the team to think through the issues prior to subsequent deployments.

Integrate fairness metrics & mitigation strategies into your ML pipeline

Researchers in academia and industry are making steady progress in developing algorithms to help mitigate unwanted bias in training data and models. Fairness metrics are metrics used to measure the efficacy of these mitigation strategies. To get a sense of these metrics, consider a simple example: a bank creates a model to predict whether a person will be approved for a loan or not and the model is trained on historical lending data. If a loan is approved, we call that a favorable outcome and, if the loan is denied, an unfavorable outcome.

One measure of fairness is to quantify the difference in the model's favorable outcome rate between groups of interest, say groups A and B. Let's consider group A to belong to an unprivileged group and group B to a privileged group, under the assumption that a person in group B has an unfair advantage of getting a loan over group A. The statistical parity difference (SPD) is a fairness metric defined as the difference between the favorable outcome rate of the unprivileged group and privileged group. If the SPD is zero, it means there is parity between the two groups, indicating that the model is fair (at least between groups A and B). If the SPD is negative, it implies a bias against the unprivileged group. Figure 1 illustrates a sample calculation of SPD using our hypothetical banking scenario.

Figure 1: Illustration of statistical parity difference (SPD) calculation. Using historic training data, a banking model predicts whether a person's loan will be approved (a positive outcome) or denied (a negative outcome). Statistically, the model …

Figure 1: Illustration of statistical parity difference (SPD) calculation. Using historic training data, a banking model predicts whether a person's loan will be approved (a positive outcome) or denied (a negative outcome). Statistically, the model delivers a positive outcome for group B six out of ten times (on average) while a positive outcome for group A five out of ten times (on average). In this case, the SPD = -10% (50%-60%), suggesting that the model is unfairly biased against group A.

While fairness metrics like SPD can help you detect unwanted bias in your model, the next step will be to take steps to mitigate societal bias. A good place to start is with IBM's AI Fairness 360 open source toolkit, available in the Python and R languages, containing 70+ fairness metrics and 10 state-of-the-art bias mitigation algorithms developed by experts in academia and industry around the world.

Each algorithm in the suite is designed to mitigate unwanted bias in one of three model training stages: (1) pre-processing stage, where training data/labels may be transformed, modified, or weighted to promote fairness during subsequent training, (2) in-process stage, where the model itself is adapted to encourage fairness, e.g., by adding a discrimination-aware regularization term to the loss function or by introducing a fairness metric as an input to the classifier, and (3) post-process stage, where the predictions are altered to make them more fair.

We recommend using fairness metrics that are relevant to your application as a means of detecting and monitoring bias and to integrate bias mitigation algorithms into your ML pipeline in order to continuously monitor and detect drifts in societal bias over time. At a minimum, we recommend at least tracking representations of demographic groups in your training data. Even if you cannot go as far as defining and configuring a fairness metric, knowing and tracking the quantities of various demographic groups in your training data over time will force you to confront potential problems before they can cause problems in your production environment.

Consider alternatives when using pre-trained models

One way of quickly getting your application into production is to download and use a publicly available pre-trained model and apply it to your data. However, be aware that the data used to train those models may contain offensive biases.

For example, in the natural language processing world, there are a few popular pre-trained word vector models that you can download for free (e.g., word2vec or GloVe) that are trained on large corpora scraped from websites known to contain human stereotypes and prejudices, e.g., Common Crawl and Twitter. As an alternative, you could use ConceptNet Numberbatch to train your model, which was built (in part) to help mitigate unwanted societal bias.

The corpora that you use for training should ultimately depend on the purpose of your model. For example, if you are building an AI model to automatically flag inappropriate or bullying conversation in a chatroom then it is desirable to train your model with corpora containing offensive and discriminatory language. If instead you want to train a chatbot to converse civilly with human beings, you likely want to steer clear of training with corpora containing inherent negative societal bias so that you don't unintentionally offend your users.

robot_human_love_small.jpg

AI for good

We at XYONIX work by the credo "AI for Good". Part of that effort is in developing models that are fair and equitable, that do not discriminate, and generally make the world a better place. There are other organizations that share our beliefs and we feel it is important to point out their contributions. Notable examples include:

  • blind charging: Race based charging decisions being made in the criminal justice system by prosecutors upon examination of free text narratives describing alleged criminal events. Machine learning is used to mitigate bias in charging decisions via automated race redaction in eye witness text statements.

  • ethics and human rights consortium: SHERPA is an EU-funded project which analyzes how AI and big data analytics impact ethics and human rights. As part of a public education effort, SHERPA developed a website to illustrate how face recognition is being used today to infer various characteristics about you as a person, e.g., how "good looking" you are as well as estimates of your age, body mass index, life expectancy and gender. Rather than merely an entertainment vessel, such technology is being used today by dating sites and insurance agencies and the underlying models often have strong societal bias.

  • addressing racial profing: Racial profiling has been shown to be prevalent in police traffic stops including a heavy bias against black people. Widespread publication of statistics on unnecessary traffic stops has been shown to reduce the stop percentage by 75% while simultaneously not showing an increase in crime.

  • promoting equity in automated speech recognition: five state-of-the-art commercial ASR systems were shown to misunderstood Black speakers twice as often as white speakers, highlighting the need for more diverse training data.

  • debunking voter fraud claims: work done debunk claims of double-voting, a form of voter fraud. The conclusion is that "there continues to be simply no proof that U.S. elections are rigged."

These positive efforts illustrate how AI can be both highly beneficial and fair in making very important decisions that affect our daily lives. 

With a growing awareness of the issues we face in using AI, we are also seeing more legal standards being set at city and state levels to protect U.S. citizens from unethical use of AI. For example, Illinois was the first state in the nation to define a statute that imposes transparency, consent, and data destruction duties on employers using AI to screen applicants for Illinois-based positions. It is now illegal in Maryland to use facial recognition during job interviews and San Francisco banned facial recognition by police and the New York City Council has proposed legislation to regulate the use of AI in making hiring decisions. Legislation is also being introduced at the federal level, notably surrounding the issuance of an Executive Order regarding the regulation of AI along with responses from the Department of Defense and the National Institute of Standards and Technology.

Summary

In this article, we outlined clear steps that you can take to mitigate societal bias in AI models. Machine learning, by its very nature, is statistically discriminative. However, this discrimination becomes undesirable when a particular group is at a systemic and distinct disadvantage. The damage done by a racist, sexist, ageist, or otherwise offensive AI may be substantial, particularly in scenarios where AI is being used to make life altering decisions in areas like  healthcare, hiring, lending, university admissions and criminal justice. As a result, equity, fairness and inclusivity in AI has never been more important. Yet, there are many recent high profile examples of “AI gone wrong”, where highly offensive and damaging decisions were made, in some cases adversely affecting millions of people. These examples serve to remind us of the importance of detecting, monitoring and mitigating societal bias in AI models. The future may even hold legal action for companies who use AI known to discriminate against protected classes. Such legislation is important to protect minority classes and to ensure that AI promotes fairness and equity in society.