Ethical AI: Bias and Fairness — Definitions, Sources, and Challenges
This is part 2 of a blog series on Ethical AI. For context on this series and why we’re writing it, see our introduction in part 1, Ethical AI: Privacy and Security.
The content was adapted from an internal learning and development session developed by the Machine Learning Engineers at HealthEdge®, focused on educating our organization on ethical use of artificial intelligence (AI). At HealthEdge, our belief in safe and responsible AI use shapes how we use these tools internally and how we build AI-powered solutions for our customers.
Addressing Bias and Fairness in AI
Bias and fairness are foundational components of ethical AI, and among the most difficult to address in practice. Most people using an AI agent wouldn’t know that factors like someone’s name, insurance provider, or demographic information can drastically alter a response—even with an otherwise identical prompt.
Bias in AI systems is rarely visible in a single output, and without deliberate measurement it often goes undetected.
Understanding AI Bias
When it comes to AI, bias refers to when AI systems produce outputs that unfairly favor or disadvantage certain groups.
Bias is not a hypothetical scenario. Production AI systems have demonstrated measurable bias across a range of industries and use cases, often without intention on the part of the developers. In each case, these systems appeared to function as expected until the outputs were measured across demographic groups.
Dimensions of Demographic Risk
Bias does not affect all people equally, and understanding its scope requires looking at the range of characteristics that can impact the output from an AI tool. Categories of bias can include:
- Legally protected categories (e.g. age, disability, religion, nationality)
- Health-related attributes (e.g. mental health diagnoses or chronic conditions)
- Socioeconomic factors (e.g. literacy, immigration status, living conditions)
Many people belong to more than one of these categories, and bias can compound across each. For AI tools, this means a single output can carry the weight of multiple intersecting inequities.
4 Ways to Recognize Bias in AI Outputs
Knowing that bias exists is not enough—addressing it requires understanding the specific forms it takes in real system outputs.
Bias rarely takes the form of a clearly wrong answer. In practice, it tends to surface in more subtle ways. These are four common manifestations of bias in AI outputs:
1. Quality differences
Appears as inconsistent accuracy or reliability across demographic groups. For example:
- Higher hallucination rates or lower accuracy for certain groups of users
- Greater uncertainty in answers about member groups where training data was underrepresented
2. Tone and framing
Appears as different verbiage, tone, or framing across groups. For example:
- Characterizing identical behavior as “assertive” for one group and “aggressive” for another
- Cold language for one group and warmer, familiar language for another
- Less detailed output for or about certain demographic groups
3. Stereotype-driven gap-filling
Appears as AI filling information gaps with learned assumptions. For example:
- Different assumptions about a member’s likely needs based on demographics
- Assumptions about a provider’s specialty based on demographics
4. Outcome differences
Appears as different recommendations, actions, or end outcomes for different groups. For example, in agentic systems, different thresholds for autonomous action versus requesting human confirmation.
Distinguishing Bias from Appropriate Differential Treatment
The relationship between differential treatment and bias is not always straightforward and conflating the two can lead to over-correction and under-detection.
Not every instance of differential treatment represents bias. For example, women experiencing heart attacks frequently present atypically, and an AI that adjusts its diagnostic approach accordingly is addressing a historical gap in clinical practice, not perpetuating existing disparities. A conversational AI that simplifies its language for a user who identifies English as their second language is tailoring its response to serve that user better.
The central question is whether differential treatment reinforces historical patterns of disadvantage or not.
Sources of Bias
Bias is not introduced at a single point in an AI system’s development. Rather, it can accumulate across every stage, compounding the impact. Some of these steps may include:
- Training data reflects historical human behavior, including historical discrimination.
- Source data is shaped by those who produced it. For example, an AI system handling patient records relies on providers’ recorded interpretation of those patients. If provider notes document pain differently by patient race, or apply more skepticism to patients with mental health conditions, those biases get passed to the AI system.
- Developer prompts can un-intentionally embed assumptions. For example, a system prompt to “synthesize medical information” may lead to lower emphasis on mental health conditions if the model has absorbed the historical conflation of “medical” with physical health.
- User phrasing can cause framing effects in their inputs. For example, “why is this patient non-compliant?” could produce a materially different result than “what barriers might be affecting this patient’s care?”
- Evaluation data may overrepresent certain populations, causing the model to be optimized primarily for those groups.
The Limits of Demographic Omission
A common instinct when trying to reduce bias is to remove demographic fields from model inputs. However, this approach misunderstands how AI systems operate in practice. AI systems infer group membership indirectly through proxies. For example, names can signal gender and ethnicity, zip codes correlate with race and income, and historical cost of care can encode prior access disparities. These proxies can influence AI system behavior in the same way raw demographic fields can.
The claim that a system is unbiased because it does not explicitly use race or gender overlooks how demographic information enters the model through these correlated features.
Continuing the Series
This post has explored the nature of AI bias, the forms it takes in practice, and the layers of the AI pipeline through which it is introduced. The following post in this series turns to the practical question of what can be done to detect, measure, and mitigate it. View part 3 of the series here, Ethical AI: Bias and Fairness — Practical Steps for Every Role.