This is part 3 of a blog series on Ethical AI. For context on this series and why we’re writing it, see our introduction in part 1, Ethical AI: Privacy and Security.
How Can Payers Address Bias in AI Systems?
The previous article in this series covered what AI bias is, how it surfaces in outputs, and how it can enter AI systems. The key points: fairness is a key component of ethical AI, and its effects can be subtle and difficult to detect. Bias exists along a variety of demographic, medical, and socioeconomic lines, and can be unwittingly introduced even with good intentions.
This post addresses the practical question that follows: how can an organization address bias in AI systems?
The answer varies by role. Every person at HealthEdge® engages with AI in some capacity—as a user, as a tool selector, or as a builder—and the interventions available differ accordingly.
Guidelines for AI Users
The most immediate leverage most people have over AI bias is how they interact with AI systems day-to-day—and that leverage is more significant than it might appear.
Before writing a prompt, consider the framing. AI output reflects the inputs provided, and embedded assumptions shape the result. For example, consider querying an AI for a healthcare use case. Would this input be described the same way if the person had a different name, different insurance coverage, or a different background? For example, perceptions of a patient being “drug-seeking” versus exhibiting “undertreated pain” represent the same clinical presentation framed differently, and the AI model will respond to each in meaningfully different ways. In addition, stay aware of the degree and quality of context you provide—if that differs across cases, the model’s output quality may differ as well.
Be aware of common bias patterns when reviewing AI output. For example, recommendations that vary by demographic attribute, summaries that are shorter or less detailed for certain groups, tone differences across groups, or assumptions that fill ambiguous information with stereotypes.
When an output seems problematic, reporting the observation and adapting in the short term are both important steps. In the interim, adjusting prompts to counteract observed patterns is a practical response.
Guidelines for AI Buyers
Beyond individual interactions, many people at HealthEdge influence which AI tools and vendors the organization adopts.
For any decision involving the selection of AI tools—for enterprise software, vendor collaborations, or personal day-to-day use—bias considerations should be explicitly included in the assessment.
Evaluation should include:
- What bias metrics does the system use, and what justifies that choice for this use case?
- What training populations does the model reflect?
- Are disaggregated performance metrics available?
- Are there published model cards or transparency reports that acknowledge known limitations?
- What monitoring occurs after deployment?
Responses like these should serve as warning signs:
- Dismissing bias concerns with “we don’t use race or gender” and ignoring proxy variables
- Claiming a tool is “objective” or “unbiased” without supporting evidence
- An inability to name a specific bias metric
- No acknowledgment of model limitations
- Testing that is limited to pre-deployment with no ongoing monitoring
Guidelines for AI Creators
For those involved in building AI features—whether as designers, engineers, product managers, or testers—the responsibility to address bias is essential. Building fair AI systems should be part of each step of the software development lifecycle.
- During design: The definition of “fair” should be established before development begins. Subject matter experts (SMEs) with a deep understanding of the use case should be included early in the process to identify the areas where bias is most likely or would cause the most harm.
- During development: Teams should audit source data for representation gaps and prompts for embedded assumptions. For example, what does an instruction to “be concise” or “extract the most relevant information” implicitly prioritize? Try testing alternative phrasings and reviewing few-shot examples for demographic diversity.
- During evaluation: Performance should be measured on subgroups rather than relying solely on aggregate metrics. Counterfactual testing should be built into the evaluation pipeline by systematically varying one demographic dimension at a time and measuring output differences. Edge cases and ambiguous inputs, where bias is likely to surface, should be included in the test set. Qualitative review is also necessary, as differences in tone, framing, and agency attribution may not appear in automated accuracy metrics and could require direct human comparison of outputs.
After deployment, monitoring for drift is important, since patterns can emerge or intensify over time. Known disparities should be documented even when they cannot be fully resolved, as transparency about the limitations of AI tools enables informed use and creates a foundation for future improvement.
Building Trust Through Shared Accountability
The steps outlined above span different roles and different stages of the AI lifecycle, but they share a common thread. The central theme is that fairness in AI is not a task to be delegated or a box to be checked at the end of a project. Instead, it requires deliberate attention across every stage of the process—from everyday use to vendor selection, and prompt construction through deployment and post-release monitoring.
The organizations that sustain that attention will not simply avoid causing harm but build systems that healthcare organizations and the members they serve can genuinely trust.
