Algorithmic bias

In computing, algorithmic bias occurs when algorithms repeatedly produce systematic errors, inaccurate results, or imbalanced outcomes as a result of prejudicial or unfair assumptions introduced at the programming level. It is a feature of machine learning (ML) and artificial intelligence (AI), which depend on large bodies of objective, reliable data. If the data used to create the algorithms that power ML and AI systems is flawed, discriminatory, or incomplete, the system will generate results that reflect these shortcomings.

Algorithmic bias is rooted in the perspectives of the human computer scientists that design, test, and refine ML and AI technologies. It usually results from human designers inadvertently or purposefully transposing their own cognitive or subconscious biases onto algorithms, or as a consequence of basing algorithms on flawed or deficient datasets. Algorithms can produce forms of bias including stereotyping, selective perception, and confirmation bias, among others.

rssalemscience-20210528-8-189174.jpgrssalemscience-20210528-8-189186.jpg

Background

An algorithm is a purpose-built set of instructions that tell a computer system how to solve a problem or accomplish a particular task. They guide computerized decision-making processes, leading the system from a task or problem through distinct steps to a solution or outcome. For example, computer applications that give users driving directions to a specified destination use algorithms to compare possible routes, consider traffic conditions, and compute the fastest or most direct way to go.

Machine learning makes extensive use of computer algorithms, which are designed to evolve and improve automatically, without the need for additional programming, as the system encounters and works through new situations and information. It is a subset of artificial intelligence, a term that refers to a broad set of computer functions that simulate human behavior and decision-making. AI enables computer systems to solve complex problems, much the same way a person would when faced with the same task.

ML and AI projects rely on inputs known as training data, which define the models computer systems use to predict outcomes. As its name implies, training data “trains” the system how to achieve the results intended by its designers. Training data is organized, and includes annotated labels or tags that tell the system how to classify the input, differentiate it from other forms of input, and store the information for later retrieval or reference. Training data instructs the computer on how to differentiate and distinguish between various classes of objects or types of input. If the training data is inaccurate or incomplete, or if it reflects the biased thinking of the humans that collected and organized it, algorithmic bias will result. As a consequence, the ML or AI system will not be able to produce correct or impartial outcomes.

Algorithmic bias is one of several distinct types of bias seen in ML and AI systems. Others include sample bias, prejudice bias, measurement bias, and exclusion bias, all of which reference specific flaws in training datasets. Algorithmic bias is specific to the algorithms that actually perform calculations and make decisions based on such dataset flaws.

Overview

The training datasets that define the ways algorithms classify information and make decisions are said to be affected by sample bias when they are based on information that is too narrow in scope or not representative of all possible cases. For example, a computer system will conclude that all soldiers are male if it is trained using datasets that only include examples of male soldiers.

Prejudice bias imports real-world bias and discrimination into ML and AI systems by using datasets that include stereotypes, false assumptions, preconceptions, and other subjective distortions. As a result, the ML or AI system will make those same stereotypes or assumptions, or draw on inaccurate preconceptions or subjective perceptions to arrive at inaccurate conclusions. Computer science educators often use outdated stereotypes about the medical profession as an example. Datasets that present doctors as exclusively male and nurses as exclusively female will train the computer to conclude that all doctors are male and all nurses are female.

Measurement bias occurs when the methodologies used to collect the information in the dataset did not assess or measure all possible inputs correctly. Thus, any algorithms that draw on training datasets tainted by measurement bias will consistently and systematically understate or overstate results. Measurement bias can also occur if leading questions or loaded questions are used to prompt responses from human subjects if those responses are then used to form the basis of the dataset. Similarly, datasets that rely on human observation can also lead to measurement bias if the human observer is partial or fails to correct for known inaccuracies.

Exclusion bias describes situations in which important omissions lead to the incompleteness of a dataset. For example, a programmer attempting to train an ML or AI system to recognize individuals of different races commits exclusion bias if they exclude all people of a specific race from the dataset. The system will erroneously conclude that the incomplete set of races featured in the dataset encompass all possibilities when, in actuality, it does not.

Algorithmic bias is becoming an increasingly salient topic within computing as human society continues to integrate ML and AI technologies into everyday life. It particularly impacts the inferences and conclusions computer systems make about individuals or groups of people, especially with regard to their identifying characteristics, demographic qualities, personal or consumer preferences, and predictions about their future choices or behaviors. The AI boom of the early twenty-first century, including the advent of ChatGPT in 2022, led many to further examine the impact of algorithmic bias.

The Brookings Institute identifies the criminal justice system as an example of how algorithmic bias can perpetuate real-world prejudice and discrimination. In the United States, algorithmic tools are now used in judiciary risk assessments that evaluate criminal defendants for bail eligibility or sentencing. These tools draw on training datasets that rely solely on raw statistical data, which generally replicate underlying assumptions about the racial or demographic groups most likely to jump bail or reoffend in the future without considering qualitative or underlying factors. Thus, they tend to result in systematic errors that have an aggregated impact on certain population groups, reinforcing a tendency to impose higher bail limits and lengthier sentences on people of color and perpetuating racism in the criminal justice system.

Bibliography

Baer, Tobias. Understand, Manage, and Prevent Algorithmic Bias: A Guide for Business Users and Data Scientists. Apress, 2019.

Cogito Tech LLC. “Understanding the Importance of Training Data in Machine Learning.” Medium, 26 Aug. 2019, cogitotech.medium.com/understanding-the-importance-of-training-data-in-machine-learning-da4235332904. Accessed 8 June 2021.

Jonker, Alexandra, and Julie Rogers. "What Is Algorithmic Bias?" IBM, 20 Sept. 2024, www.ibm.com/think/topics/algorithmic-bias. Accessed 22 Nov. 2024.

Lee, Nicol Turner, Paul Resnick, and Genie Barton. “Algorithmic Bias Detection and Mitigation: Best Practices and Policies to Reduce Consumer Harms.” Brookings Institute, 22 May 2019, www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/. Accessed 8 June 2021.

Photopoulos, Julianna. “Fighting Algorithmic Bias in Artificial Intelligence.” Physics World, 4 May 2021, physicsworld.com/a/fighting-algorithmic-bias-in-artificial-intelligence/. Accessed 8 June 2021.

Pratt, Mary K. “Machine Learning Bias (AI Bias).” TechTarget, July 2020, searchenterpriseai.techtarget.com/definition/machine-learning-bias-algorithm-bias-or-AI-bias. Accessed 8 June 2021.

“Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System.” Partnership on AI, 2020, www.partnershiponai.org/report-on-machine-learning-in-risk-assessment-tools-in-the-u-s-criminal-justice-system/. Accessed 8 June 2021.

Sarker, Iqbal H. “Machine Learning: Algorithms, Real-World Applications and Research Directions.” SN Computer Science, vol. 2, no. 160, 2021, doi.org/10.1007/s42979-021-00592-x. Accessed 8 June 2021.

“Understanding Algorithmic Bias and How to Build Trust in AI.” PricewaterhouseCoopers, 2021, www.pwc.com/us/en/services/consulting/library/artificial-intelligence-predictions-2021/algorithmic-bias-and-trust-in-ai.html. Accessed 8 June 2021.