Building Trust in AI: A Guide to LLM Evaluations 

Large language models (LLMs) are inherently probabilistic, meaning the same input can produce different outputs. That variability makes traditional unit tests, which verify exact results, ineffective for AI systems. In healthcare, where quality and accuracy are nonnegotiable, this creates a unique challenge: how do you ensure AI performs reliably at scale? At HealthEdge, we address this through a multi-layered evaluation strategy that combines human evaluations, LLM-as-a-Judge, CI/CD automation, and online, real-time monitoring to meet healthcare’s rigorous quality standards.

Why do we need multiple evaluation types?

Each serves a distinct purpose in the AI development lifecycle:

  • Human evaluations establish ground truth. Only domain experts can judge whether an AI summary captures clinically relevant details or if generated test cases are actually executable. Humans define what “good” looks like.
  • LLM-as-a-Judge scales human judgment. We can’t have subject matter experts (SMEs) review every output during rapid development. A judge-LLM applies human-defined criteria consistently across thousands of examples, enabling fast iteration.
  • CI/CD regression evaluations prevent quality backslides. When prompts or models change, automated tests catch regressions before they reach production, which is essential when multiple teams ship AI features weekly.
  • Online (real-time) evaluations catch real-world drift. Production traffic contains edge cases that no test dataset anticipates. Continuous monitoring detects degradation before users complain.

We’ll illustrate each type of evaluation using our QA Test Case Generation Agent, which reads Jira tickets and generates test cases with titles, preconditions, steps, and expected results.

Human Evaluations

Human evaluations are the gold standard. For healthcare AI, human oversight is non-negotiable. AWS Bedrock supports this through human-based evaluation jobs: collect inference examples, upload to S3, create evaluation jobs with custom metrics, and review results through Bedrock’s console.

SMEs are best suited for measuring the performance of highly complex operations. For instance, the QA Test Generation Agent takes in a nontrivial input, a Jira ticket, and outputs an entire spreadsheet of test cases with multiple test steps. It takes a multitude of steps to translate from input to output, all of which simulate the role of a QA engineer.

LLM-as-a-Judge

LLM-as-a-Judge uses a second LLM to evaluate primary agent outputs, scaling human-like judgment across large datasets without requiring SME time for every evaluation run.

Each evaluation metric is defined by a prompt that instructs the judge LLM what to assess and how to score. For example, a “Relevance” evaluator prompt asks the LLM to compare the generated output to the source input and rate how relevant the response is. These evaluation prompts can be customized for domain-specific criteria, allowing teams to encode their quality standards into reusable, automated checks.

When initially building LLM-as-a-Judge evaluators, it’s helpful to compare their scores against human evaluations on the same dataset. This calibration ensures the LLM evaluators resemble SME judgment as closely as possible. If the judge LLM scores differ significantly from human reviewers, the evaluation prompt needs refinement until alignment improves. Bedrock offers built-in evaluators for correctness, relevance, and hallucination, as well as custom prompts.

For the QA Test Generation Agent, the same criteria evaluated by SMEs can be provided as prompts for the LLM judges, providing a secondary aggregate of metrics. Acting as a baseline, they can indicate any dips in performance of the agent.

CI/CD Regression Evaluations

CI/CD evaluations automate quality gates. When developers merge changes to prompts, models, or agent architecture, automated evaluations catch regressions before they move into production.

AWS AgentCore integrates with GitHub Actions to configure datasets, define LLM-as-a-judge evaluators, and specify task functions. The pipeline triggers on the merge, blocking deployment if thresholds aren’t met.

For example, with the HealthEdge QA agent, we block if test comprehensiveness, as evaluated by an LLM as a comparison with ground truth data, drops below 80% or CSV output format adherence falls below 95%.

Online (Real-Time) Evaluations

Online evaluations monitor production traffic, sampling live requests to detect drift that static datasets miss. These evaluations use the same LLM-as-a-Judge evaluators defined during development, applying them continuously to production data rather than pre-constructed test sets. AgentCore supports configurable sampling (1-5% of traffic), running judge prompts on sampled requests and surfacing score trends through observability dashboards. If quality degrades from unexpected inputs, online evaluations catch it before users report issues.

The Evaluation Lifecycle

These four evaluation types form a continuous loop: human evaluations establish ground truth; LLM-as-a-Judge enables rapid iteration; CI/CD gates releases; online monitoring feeds edge cases back into development.

AI evaluation requires fundamentally different approaches than traditional testing. By combining human evaluations, LLM-as-a-Judge, CI/CD automation, and real-time monitoring, HealthEdge ensures AI features meet healthcare’s quality standards.

To follow HealthEdge’s AI strategy in greater detail, visit the Resources section of our website, www.healthedge.com.

Cut Administrative Costs Up to 40% With an Integrated Operating Model 

For the second consecutive year, 52% of health plan executives named “managing rising costs” as the top challenge, according to the 2026 HealthEdge® Annual Payer Report.

In the same survey, 85% of executives reported that regulatory compliance requirements directly cut into their margins. Health plans using HealthEdge HealthRules® Payer enter this environment with the advantage of a next-generation core administrative processing system (CAPS) designed specifically for these challenges. The next opportunity is to expand that advantage across your organization’s workflows by adopting an integrated operating model.

From Next-Generation Software to a Modern Operating Model

HealthRules Payer delivers capabilities that legacy systems cannot match, including rapid configuration, high auto-adjudication rates, and the flexibility to quickly adapt to regulatory changes.

Across the industry, health plans that have modernized their CAPS still operate within traditional organizational structures and practices. Internal teams, external vendors, and contracts are organized by function, each with its own disparate accountability and performance tracking. That model leaves significant operational value on the table because of persistent inefficiencies like:

  • Low visibility & fragmented accountability. Separate teams and vendors manage claims, enrollment, billing, and member services, each under distinct contracts and using unique performance tracking. This reduced transparency prevents cross-functional collaboration and efficiency.
  • Downstream lag on configuration changes. The platform adjusts rapidly, but the operation around it may take weeks to catch up. Retraining staff, updating enrollment procedures, and revising compliance documentation all follow separate timelines.
  • High administrative costs. Without operational transparency and integration, staffing overhead and managing multiple vendors can dilute the efficiency gains HealthRules Payer delivers.
  • Compliance execution that remains reactive. Each regulatory change triggers a cross-team operational scramble managed as a special project rather than an adaptable, repeatable workflow.

These are operating model challenges and, left unaddressed, they compound. Technical debt grows, capital stays locked in maintenance, and the ability to expand into new markets or new lines of business narrows. Overcoming these challenges requires integrated platforms that can handle rapid modernization to deliver outcomes.

An Integrated Operating Model with Guaranteed Financial Outcomes

Moving to a modernized operating model is a structural change. Software ownership gives way to outcome accountability. Siloed systems and multiple vendor relationships give way to an end-to-end operating ecosystem managed by a single partner.

The HealthEdge integrated operating model consolidates technology, operational processes, and a dedicated global service delivery team under a single governance structure and set of service-level agreements (SLAs). The partner who builds the platform is also the one who operates it and is accountable for the results, with contractually guaranteed cost reductions tied to defined performance measures.

This is a different approach than patching point solutions onto an existing environment or layering new tools on top of an outdated operating structure. Those practices may address some symptoms, but they can add costs and do not change the underlying model. An integrated ecosystem replaces the model itself.

Across current deployments, health plans operating within this model have achieved:

  • 30–40% reduction in administrative cost per member per month (PMPM), sustained year-over-year.
  • Return on investment (ROI) within 12–15 months, compared to 24–36 months under multi-vendor approaches.
  • Budget variance below 1%, against an industry benchmark of 10–15%.

With HealthEdge, these are contractual commitments—not projections.

Turning a Technology Advantage into an Operational One

For health plans already operating on HealthRules Payer, the integrated operating model means having a dedicated team with deep expertise in the platform, acting as an extension of your organization to ensure that investment works harder across every function.

One Accountable Partner, End to End

HealthEdge owns the platform, the global operations delivery team, and the outcomes. Vendor sprawl and split accountability are replaced by a single point of responsibility for performance, backed by shared risk. Ongoing platform upgrades and regulatory updates are included at no additional cost.

Configuration Changes That Execute at Platform Speed

When a benefit rule, pricing change, or regulatory update is configured in HealthRules Payer, the downstream operational response is managed by one team at a rapid pace. New lines of business that take months under traditional models can launch in days.

Administrative Costs Decrease and Stay Down

Standardized work processes, automated task processing, and economies of scale across a global delivery team of more than 7,000 drive 30–40% reductions in health plan administrative costs, sustained year over year. Savings are tied to process enhancement rather than headcount reduction. You pay for measurable results, not employee hours or resource inputs.

Compliance Built into Daily Operations

Centers for Medicare and Medicaid Services (CMS) mandates like the Transparency in Coverage final rules, state requirements, and audit readiness tools are embedded into standard workflows within HealthRules Payer. Plus, regulatory changes are operationalized as they arrive.

Real-Time Visibility Across the Full Operation

A centralized data hub in the platform surfaces performance data through unified executive dashboards. Backlogs, SLA risks, and compliance exposure become visible before they become problems.

Leadership Reclaims Time for Strategy

When a single partner manages daily execution and owns the outcomes, health plan leadership moves capacity toward growth and member experience.

The HealthEdge advisory team works alongside health plan leadership from day one to ensure operational continuity.

The New Operating Model in Practice

One regional health plan moved to an integrated ecosystem model, unifying core administration, claims, enrollment, billing, correspondence, and member services under a single HealthEdge operating structure in fewer than 12 months.

The payer’s administrative spend dropped from $11.50 to $6.90 per member per month (PMPM), a reduction of approximately 40%. Claims accuracy reached best-in-class levels, and the plan reported EBITDA uplift across every function in the model.

Operational improvements like these are still the exception among health plans, though they are available to any organization. Adopting a fully integrated technology-and-operations model can help put your health plan at the forefront of the adoption curve, giving your organization a competitive advantage.

Learn more about the power of an integrated operating model in action. Download our case study to see how one health plan successfully migrated its entire ecosystem under pressure and achieved a 99.8% auto-adjudication rate.

A Practical Way to Get Started Now

HealthEdge offers a cost-driver assessment that builds a baseline of current administrative spend, identifying where costs originate and where an integrated model would generate the greatest return. Whether a health plan moves forward with HealthEdge or not, the assessment delivers an independent, practical view of the current operation.

Contact us to learn more.

Preparing for Software Testing: 8 Best Practices for Health Plans 

When a health plan undertakes a major technology change, whether implementing a new platform, modernizing a legacy system, or rolling out new functionality, the promise is compelling: streamlined workflows, greater automation, and more time for teams to focus on strategic priorities.

Before those benefits can be realized, however, there is a critical step that determines whether the transition succeeds or struggles: User Acceptance Testing (UAT).

For many health plans, UAT is unfamiliar territory. Others may not have gone through a large-scale testing effort in years. In either case, preparation is key. This article draws on the experiences of the HealthEdge® Global Professional Services testing team to deliver eight best practices that help payers prepare for a successful testing engagement and a smoother go-live.

What Is User Acceptance Testing (UAT)?

UAT is the final phase of the software testing process. It’s where business users (not technical teams) validate that the system meets business requirements and is ready for day-to-day operations in a production environment.

It is not about proving the software works technically. It’s about confirming that the solution supports real-world workflows, produces accurate and compliant outcomes, and enables users to do their jobs effectively.

Why UAT Matters for Health Plans

UAT plays a critical role in reducing risk during major technology changes, enabling health plans to:

  • Confirm that business requirements are met
  • Validate end-to-end workflows across teams and systems
  • Identify gaps missed in earlier testing phases
  • Reduce the likelihood of costly post–go-live issues
  • Support compliance and audit readiness
  • Build user confidence and encourage adoption

Most importantly, UAT provides health plans with one final opportunity to ensure readiness before the new system goes live.

The Health Plan’s Role in the Testing Process

While every software development is unique, users play a central role in ensuring the solution works as intended in real-world use. In most testing engagements, the health plan’s responsibilities include:

  • User Acceptance Testing: Leading business validation to confirm the system supports operational needs
  • Providing Test Data and Access: Supplying realistic data and user credentials for testing
  • Business Requirements Validation: Confirming that configured workflows align with business expectations
  • Final Sign-Off: Approving the solution for production following successful UAT

Although users are most active during UAT, effective testing starts much earlier. Early involvement during definition of requirements, design, test planning, and data preparation significantly improves UAT outcomes.

8 Leadership Decisions That Set the Stage for a Successful Technology Change

Major technology implementations are rarely derailed by software issues alone. More often, challenges arise when organizations underestimate the preparation required to validate new ways of working before go-live.

Successful testing is not accidental. It is the result of deliberate leadership decisions made well before UAT begins. Leaders who approach testing as a strategic business exercise, rather than a technical checkpoint, put their organizations in a far stronger position to realize value from their investment.

The following eight practices represent the most important actions leaders can take to ensure testing supports a smooth transition, confident users, and long-term success.

1. Understand the Purpose of UAT

Testing is not about finding every possible defect. The goal of UAT is to ensure the system will support your business operations once real users depend on it.

During UAT, business leaders and users should be asking:

  • Can users do their jobs effectively in the new system?
  • Do core processes work from start to finish?
  • Are outcomes accurate, compliant, and usable?
  • Is the system intuitive for different user roles?

Keeping this purpose in focus helps teams prioritize what truly matters.

2. Involve the Right People Early

The people validating the system should be the people who will use it, not just technical resources or project team members.

Health plans should consider involving:

  • Frontline users who understand day-to-day work
  • Subject matter experts familiar with exceptions and edge cases
  • Supervisors or leads who understand downstream impacts
  • Compliance, audit, or quality representatives
  • Data owners
  • Business UAT Lead

These stakeholders should be engaged early, during requirements definition, test planning, and data preparation, not only during UAT execution.

3. Protect Time for User Acceptance Testing

One of the most common challenges in UAT is underestimating the time it takes. When testing is treated as an “extra” task layered onto daily responsibilities, quality suffers.

Best practices include:

  • Allocating dedicated time for UAT participants
  • Reducing or temporarily backfilling day-to-day responsibilities
  • Setting realistic timelines for testing and retesting
  • Treating UAT as a priority business activity

Strong UAT requires an upfront time investment—but that investment pays off through smoother go-lives and fewer post-production fixes.

4. Prepare Realistic Scenarios

Effective testing goes beyond validating individual system functions. UAT should test scenarios inspired by users’ daily workflows. For example, rather than only validating a single calculation or rule, an end-to-end scenario might include logging in, accessing a member, completing an assessment, creating a care plan, and triggering follow-up tasks.

Prioritize scenarios that are:

  • High-volume or frequently used
  • High-risk from a compliance or financial perspective
  • Critical to member or provider satisfaction

These scenarios provide the most meaningful validation of system readiness.

5. Ensure Data and Configuration Are Ready

UAT is only as effective as the data supporting it. Health plans should ensure test data is realistic, complete, and accurately configured before testing begins.

This typically includes:

  • Member demographics and eligibility
  • Provider and program information
  • Role permissions and workflow configurations
  • Negative and edge-case data (such as members with no eligibility or incomplete documentation)

Poor or incomplete data can delay timelines, mask defects, and undermine confidence in testing results.

6. Train Users for UAT—But Don’t Turn It Into Full-Scale Training

Users don’t need to be system experts to test effectively, but they do need enough familiarity to execute workflows and recognize whether outcomes are correct.

Before UAT begins, ensure users can:

  • Understand the business processes they are testing
  • Navigate the system for their role
  • Enter, edit, and validate data
  • Follow test scripts and document results

Many organizations find that walking through test scenarios provides valuable hands-on learning without turning UAT into full-scale training.

7. Set Clear Expectations for Issue Management

Clear guidelines for logging, prioritizing, and resolving issues are essential to keeping testing on track.

Teams should align on:

  • What constitutes a critical issue versus a minor one
  • How and where issues are logged
  • Who determines whether an issue must be resolved before go-live
  • Communication and escalation paths

Without clear issue management processes, testing can stall, defects may be missed, and go-live decisions become more difficult.

8. Don’t Rush—or Skip—User Acceptance Testing

UAT is the only phase where real business users validate that the system supports their workflows, rules, and daily operations.

When UAT is rushed or skipped, organizations face significant risks, including:

  • Untested critical workflows
  • Higher likelihood of production defects
  • Increased project costs
  • Compliance and operational disruptions

Taking the time to complete UAT thoroughly helps protect both the organization and the users who rely on the system.

Reducing Risk and Realizing Value Faster

UAT is one of the most important milestones in any major technology change. While it’s easy to get caught up in individual defects or system nuances, the real purpose of UAT is far more strategic: to confirm that the organization is ready to operate with confidence in the new environment.

When UAT is done well, health plans gain assurance that core business processes function as intended, users can perform their roles effectively, and financial and compliance outcomes are accurate. Most importantly, it provides leadership with the confidence that the organization is prepared—not just to go live, but to succeed once the system is in production.

HealthEdge: Your Partner in Testing Success

Health plans don’t have to navigate this process alone. Experienced software and services partners like HealthEdge bring proven frameworks and expertise to guide health plans through all phases of testing, from data preparation and scenario design to execution, automation, and issue management.

Engaging the right partner early in the process helps reduce risk, accelerate readiness, and ensure that testing supports a smooth transition and long-term value from the technology investment.

See how our Global Professional Services team partners with health plans to plan, execute, and optimize testing engagements, helping teams go live with confidence and realize value faster. Read the first article in our Software Testing series, “Software Testing Essentials: Why It Matters for Health Plans and How HealthEdge® Makes It Easier.”

Building Trust in LLM Solutions: A Practical Guide to Evaluation Planning 

Artificial intelligence (AI) is fundamentally changing how healthcare software is built. From automated test case generation to intelligent documentation and decision support, large language models are becoming embedded within the software development lifecycle itself.

As AI becomes part of how solutions are designed and validated, the question is no longer just whether it adds efficiency. It’s whether organizations can systematically evaluate and trust the outputs it produces.

At HealthEdge®, we’re deploying the Wellframe QA team’s test case generation agent. The agent takes Jira tickets for new front-end functionality, including acceptance criteria, and generates test cases as CSV files for a downstream test management tool. This collaboration has demonstrated that successful LLM deployment requires building trust through rigorous evaluation.

What Are LLM Evaluations?

Traditional software applications are straightforward to assess with clearly established patterns: unit testing, integration testing, UAT, etc. LLM applications are different: infinite output possibilities, context-dependent responses, and subtle failure modes.

LLM evaluations systematically measure whether your LLM application solves the problem you built it to solve. They provide concrete evidence of what works and reveal specific areas that need improvement.

Evaluations serve different audiences with different needs.

  • For stakeholders, they provide transparency and set realistic expectations about what the system can and cannot do.
  • For developers, they highlight specific shortcomings that need attention and help prioritize improvement efforts.
  • For users, they build confidence that the system has been rigorously tested.

The ultimate goal is trust. Users need to trust that your LLM solution will perform reliably. Evaluations are how you earn and maintain that trust.

The Four Components of a Robust Evaluation Plan

Our QA test generation agent presents a complex evaluation challenge. Given a Jira ticket, it generates test cases with sections, titles, preconditions, steps, expected results, and metadata. There’s no single correct output, and quality is multidimensional.

Consequently, we devised a complete evaluation plan with four components: criteria, methods, dataset, and execution strategy.

Component 1 – Evaluation Criteria: Criteria should stem directly from the problem the model is solving. For our QA test generation agent, we identified multiple critical criteria based on what makes test cases valuable to our QA team:

  • Required Test Recall measures comprehensiveness. Are we generating all the necessary test cases that a human QA engineer would write? We calculate this as the number of “required” test cases covered by the agent divided by the total number of required test cases a human would write. We set a realistic target recall based on task complexity and risk.
  • Acceptance Criteria Coverage measures thoroughness. Does the generated test suite adequately test all the acceptance criteria mentioned in the Jira ticket? We target 90%+ coverage to ensure nothing slips through the cracks.
  • Test Comprehensiveness involves human evaluators to score on a 1-5 scale based on their holistic judgment of the test suite’s quality.

Each criterion targets a specific aspect of quality that matters to our end users (the QA team). We’re measuring concrete traits that determine whether the agent provides real value.

The key is to cover all bases and edge cases from different angles. A test suite could score high on recall (finding all the important scenarios) but low on coverage (missing acceptance criteria details). Both matter, so we measure both.

Component 2 – Evaluation Methods: The HealthEdge team pursued three approaches:

  1. Automated computable metrics (exact match, fuzzy match) work when success is mathematically defined.
  2. Human evaluation handles judgment requiring domain expertise.
  3. LLM-as-a-judge uses another LLM to evaluate based on a rubric.

For this project, we used automated checks for format and human subject matter experts (SMEs) for quality assessment.

Component 3 – The Evaluation Dataset: This is the most critical component. If the dataset doesn’t match production, the process will miss problems. For example, a resume-screening tool designed to evaluate software engineer resumes only might fail on designer or marketer resumes in production. Evaluation datasets must follow three rules:

  1. Representative means it reflects the actual distribution of cases you’ll see in production. If 60% of production tickets describe UI features, 30% describe API changes, and 10% describe infrastructure work, your evaluation dataset should match those proportions. If edge cases happen 5% of the time in production, they should appear roughly 5% of the time in your dataset.
  2. Diverse means covering the full range of scenarios, including edge cases and failure modes. For our QA agent, we need Jira tickets that vary in complexity (simple bug fixes vs. major features), clarity (well-written vs. vague requirements), and completeness (detailed acceptance criteria vs. minimal descriptions). Each variation might affect output quality differently.
  3. Consistent means the ground truth labels or expected outputs are reliable and reproducible. If three QA engineers evaluate the same test cases, they should largely agree on what’s required and what’s comprehensive. Inconsistent ground truth means you’re measuring noise instead of signal.

For this project, the Wellframe QA team curated a substantial dataset of real Jira tickets spanning different feature types and created the “required” test cases for each. This gave the team reliable ground truth to measure against, as it was built by the very subject-matter experts who will be using the agent in production.

Component 4 – The Execution Plan: Evaluations can be offline (using your dataset), which is comprehensive and controlled, or online (monitoring production), which catches unexpected inputs but often lacks ground truth.

For our QA agent, we chose the offline evaluation because our criteria require human subject matter experts. The strategy focused on periodic manual reviews conducted every few weeks during development. Before releases, a comprehensive evaluation served as a quality gate. In the post-deployment phase, the team focused on continuous monitoring.

Putting It All Together 

To recap our successful process, the critical steps we followed included defining concrete criteria, choosing appropriate methods, investing in a high-quality dataset, and designing an execution plan. For our QA agent, we accepted that evaluation requires human SMEs. We prioritized offline evaluation and invested in a diverse dataset with ground truth.

The result: A confident deployment with evidence of strengths and visibility into limitations.

Contact HealthEdge to learn how our AI solutions are reinventing the way our software solutions are being designed and tested. 

6 Executive Strategies for Optimizing Care Coordination and Delivery 

At a recent roundtable, the HealthEdge® Chief Medical Officer led executives from three leading health plans in a discussion centered around optimizing care delivery and efficacy while improving cost control and payer performance.

Panelists included:

  • Chief Medical Officer at a member-owned health insurance company based in Illinois
  • President at a Washington D.C.-based health plan focused on children and young adults receiving Supplemental Security Income (SSI)
  • Department Vice President and Medical Director of Population Health at a not-for-profit health plan based in Kansas

1. Putting Members at the Center of Care Delivery

Department Vice President and Medical Director of Population Health: We see a unique opportunity to reposition ourselves and rearticulate the value of enabling and coordinating care to serve members well. First, we concentrate on clinical improvements. Second, we prioritize the member experience.

The healthcare ecosystem is inherently complex, and our role is to guide members through their journeys. We dedicate significant resources to high-cost claimants, as 1-2% of members can account for 30-50% of a plan’s total spend. We lead high-cost claimant rounds to review claims experiences, where a simple question like “How is the member doing?” shifts the focus from data points back to the individual.

This proactive outreach often provides the first indication of an upcoming issue that claims data would not show for another six months. It allows for more effective care coordination and fosters an empathy-driven mindset, ensuring we never lose sight of the people we serve.

In terms of outcomes, our care management programs achieve satisfaction scores in the mid-to-high 90s. As an organization, we prioritize the ease of doing business with our health plan, and we are outperforming market benchmarks. Most importantly, we translate these insights into actionable opportunities for our provider partners through value-based agreements and other relevant structures.

2. Driving Value-Based Reimbursement

Health Plan President: The journey to value-based reimbursement is unique in the pediatric space. Pediatric providers typically don’t take Medicare, so they have been mostly insulated from payment innovations. Our first obstacle was incentivizing them to even discuss alternative payment models.

We learned that value-based reimbursement starts with an engaged workforce within our health plan. Our first step was to define what we wanted to accomplish as an organization and how we would partner with providers to achieve it.

We used three strategies to achieve our goal:

  1. Breaking Down Internal Data Silos: We used the HealthEdge GuidingCare® platform to bring Utilization Management, Care Management, and Appeals and Grievances into one integrated system. This provides our care managers with a 360-degree view of their members, including complaints, legal settlements, and care gaps. We then expanded access to our marketing, outreach, and customer service teams.
  2. Embedding Care Managers: As our care management staff gained a complete view of the member, we embedded them in provider offices. This creates an interdependent relationship where providers and care managers can align their goals.
  3. Leveraging Shared Data: We established a shared population health platform with our largest national provider. We don’t need to question each other’s data because we all see the same information. This allows us to focus on our mutual goals, which are set jointly through a shared governance model and reviewed monthly to ensure they remain accurate.

3. Strengthening Payer and Provider Collaboration

Chief Medical Officer: We also leverage GuidingCare as a unified platform for medical management and population health. One key function is allowing providers with a treating relationship to view a member’s care plan. This facilitates co-management and presents a coordinated care approach to the member.

Our collaborative efforts focus on two areas: Data transparency and value-based contracting.

  • Data Transparency is essential for building strong and effective collaboration. We use Admission, Discharge, Transfer (ADT) feeds, health data exchanges, and other platforms to ensure transparent data flow between the payer and provider.
  • Value-based contracting is a tool to align cost and quality metrics between providers and payers. We incentivize providers to support work that ultimately serves the person, whether we call them a member or a patient. Through Joint Operating Committees, we review leading indicators monthly to identify and address unfavorable trends early on.

We’ve learned two crucial lessons:

  1. We must agree on what success looks like through a conversation with the provider, ensuring measures are relevant, reliable, and impactable.
  2. We must structure contracts based on provider type, setting, population served, and their comfort level with accepting risk. Not everyone is ready for a full-risk contract. We guide them along the alternative payment model spectrum, from foundational steps to shared savings and losses, and eventually to full-risk contracts.

By applying these lessons and interventions, we’ve seen increased interest in higher-risk models, with providers more willing to take on these contracts because they feel equipped with the right tools and resources to succeed.

4. Utilizing Digital Tools for Care Management

Department Vice President and Medical Director of Population Health: Our historical data showed that telephonic care management engagement rates were dwindling, so we invested in HealthEdge Wellframe™ and GuidingCare as our digital care management front door. It turns out that many members would rather text than talk.

In 2025, 32% of our meaningfully engaged members have engaged digitally. That represents a significant missed opportunity had we not adopted a digital tool. When comparing engagement in care management programs this year to the same period last year, we are up 23%. This shows that if we are truly member-centric, we must meet members where they are and offer multiple engagement preferences.

To achieve this, we are reimagining member engagement by integrating digital tools with our community health worker program. This approach takes care directly to the community, enabling us to drive better population health outcomes.

5. Personalizing Care Plans for High-Risk Members

Health Plan President: Delivering personalized care hinges on strong care management relationships, which can be challenging with healthcare workforce turnover between 20-25%. To address this, we created care management pods, assigning a team (with nurses, social workers, and community support workers) to each enrollee’s medical home, ensuring continuity despite staff changes.

This relationship also impacts technology adoption. When we first implemented Wellframe, member adoption was low. This stemmed from care managers not embracing the tool due to productivity concerns. For complex populations, a primary care physician isn’t always the medical home—much of their care comes from specialty practices. Our model focuses on collaboration between the specialty medical home and the primary care medical home, with the care manager acting as the “glue” that directs traffic and brings everyone together.

True integrated care means ensuring smooth care transitions for members across different settings.

6. Closing Gaps in Care with Real-Time Data & Analytics

Department Vice President and Medical Director of Population Health: We have a lot of data, but the key is filtering through it to find actionable opportunities. One project we worked on was a multi-modality gap-in-care program using Healthcare Effectiveness Data and Information Set (HEDIS) methodology. We would trigger communications to members when they had an open care gap.

One leader at our organization went a step further and weighed the different gaps, so if a member had multiple gaps, we knew which one to prioritize in our communication. We pushed these notifications out through multiple channels: our mobile app, our care management app, and our customer service reps, who had scripts ready.

We also discovered that while members don’t open a lot of their mail, they almost always open their Explanation of Benefits (EOB). We started putting care gap notifications directly on the EOB, along with a QR code for our Wellframe app, and it’s been amazing to see how many people have used it to take a more active role in their care.

Over the past year, this pilot closed thousands of care gaps, with a success rate of over 50% for directly engaged members. Most importantly, these insights are helping drive population health outcomes collaboratively with providers.

Innovative Solutions for the Future of Care Delivery

Addressing challenges like care coordination, cost control, and provider collaboration requires innovative solutions that prioritize transparency and seamless workflows.

By focusing on enhanced member engagement and proactive care delivery, payers can help create a system that delivers better outcomes for members, reduces costs, and improves satisfaction.

Want to learn more about how health plans are leveraging digital solutions to improve data accuracy, transparency, and efficiency? Access insights from a payer executive roundtable in our recent article, “Unlocking the Future of Healthcare Technology: Interoperability, Transparency, and AI.”

7 Payment Integrity Trends Health Plans Can’t Afford to Ignore in 2026 

Payment integrity has always played a critical role in payer operations, but in 2026, it has become a strategic imperative.

According to the HealthEdge® 2026 Annual Payer Survey, health plan leaders are navigating unprecedented pressure to control costs, manage regulatory complexity, modernize legacy systems, and improve collaboration with providers—all at the same time. Managing costs remains the number-one challenge for payers, while investments in automation, AI, and real-time data continue to accelerate.

Against this backdrop, payment integrity is evolving. No longer confined to post-pay recovery, it is becoming an enterprise-wide discipline focused on prevention, transparency, and measurable outcomes across the payment lifecycle.

Here are seven payment integrity trends shaping how health plans are preparing for 2026 and beyond.

1. Forecasting Capabilities Accelerate Confident Action

Speed matters, but confidence matters just as much.

Health plans increasingly want the ability to test changes before enforcing them. The Modeling feature within HealthEdge Source™ allows teams to model the financial and operational impact of new edits, policy updates, or regulatory changes before those changes go live.

This capability supports faster decision-making, reduces unintended consequences, and empowers payment integrity teams to respond quickly as business needs evolve.

HealthEdge Source in Action:

In a recent HealthEdge Source case study, a regional health plan used Modeling to preemptively gauge the impact of new payment edits before enforcement—allowing the organization to move quickly while avoiding downstream disruption to providers and operations.

2. Transparency Becomes the Foundation of Effective Payment Integrity

Health plans are moving away from opaque, single-direction approaches to payment integrity. Instead, they are prioritizing transparency, both internally and externally, as a way to reduce friction, improve accuracy, and build trust.

Modern payment integrity programs increasingly rely on:

  • Software that explains how informational edits impact a claim
  • Early alerts that surface potential issues before payment
  • Provider-facing education tools that reduce disputes and rework

When payment decisions are clear and explainable, health plans can enforce accuracy while maintaining productive provider relationships—turning payment integrity into a collaborative process rather than a reactive one.

HealthEdge Source in Action:

In a HealthEdge Source case study, SummaCare used informational edits and early alerts to proactively communicate payment policy changes to providers. The result was a measurable reduction in provider inquiries and disputes, demonstrating how transparency can improve payment accuracy and provider relationships.

3. Cost Avoidance Takes Priority Over Post-Pay Recovery

In 2026, prevention is the new performance benchmark.

Health plans are shifting away from “pay-and-chase” models toward prospective payment accuracy, where errors are identified and addressed before dollars leave the door. This approach improves financial outcomes, reduces administrative burden, and accelerates claims throughput.

By embedding payment integrity earlier in the claims lifecycle, plans can:

  • Minimize downstream rework
  • Enhance transparency
  • Improve overall administrative loss ratio (ALR)

HealthEdge Source in Action:

A large Southeast health plan featured in a Source case study shifted its payment integrity strategy upstream by enforcing prospective edits before claims were paid. By reducing reliance on post-pay recovery, the plan lowered administrative overhead and improved overall cost avoidance, supporting better ALR performance while accelerating claims processing.

4. Real-Time Data Enables Faster, Smarter Enforcement

Payment integrity is only as effective as the data behind it.

Health plans continue to struggle with fragmented systems and delayed insights, which limit their ability to act quickly. This is especially relevant for payers facing an increase in claims rework. AI-powered tools within payment integrity platforms can help payers meet the real-time claims volume and prevent their teams from being overburdened.

Modern payment integrity platforms must operate within a connected ecosystem, bringing together claims, contracts, eligibility, and provider data to enable faster enforcement and better outcomes at scale.

5. AI Moves from Detection to Decision Support

Artificial intelligence has been used in the background of payment integrity processes for years—and now it’s taking the spotlight as a foundational tool.

The HealthEdge report“Elevating Payment Integrity: The Role of AI in Enhancing Payment Accuracy,” outlines how AI is transforming payment integrity from manual, rules-heavy processes into adaptive, intelligence-driven workflows. AI is now being used to:

  • Identify complex patterns that traditional rules miss
  • Prioritize high-risk claims and edits
  • Continuously learn from outcomes to improve accuracy over time

Rather than replacing human expertise, AI augments it, helping payment integrity teams focus on the highest-impact decisions while improving consistency, speed, and precision.

HealthEdge Source in Action:

HealthEdge Source case studies show how combining machine learning with configurable edits helps health plans prioritize high-risk claims, reduce manual reviews, and continuously improve payment accuracy as business conditions change.

6. BPaaS Emerges as a Scalable Operating Model

As cost pressures intensify, health plans are rethinking not just what they do in payment integrity, but how the work gets done.

Business Process as a Service (BPaaS) is gaining traction as a way to combine technology, automation, and expertise into a single, outcomes-focused model. In payment integrity, this approach helps plans scale programs, respond faster to regulatory and policy changes, and reduce administrative burden without adding headcount.

What’s different in 2026 is the level of integration. Health plans have the opportunity to embed BPaaS directly into core claims and payment workflows. Deeper integration between claims administration and payment integrity platforms allows for plans to decrease overlapping work between claims and payment integrity teams. This type of integration enables unified configuration, streamlined claims review, and faster enforcement that helps reduce duplication, training complexity, and total cost of ownership.

HealthEdge Source in Action:

Through deep integration with solutions like HealthRules® Payer and HealthEdge Source, BPaaS can help ensure payment integrity workflows are increasingly unified, from configuration through claims review, allowing teams to act faster, reduce manual effort, and scale payment integrity operations without disrupting core claims processes.

7. Payment Integrity Becomes an Enterprise Trust Builder

Ultimately, payment integrity is no longer just about dollars. It’s about trust.

Accurate, transparent, and timely payments reduce friction with providers, support regulatory compliance, and reinforce confidence across the organization. When payment integrity programs are aligned with enterprise goals, they become a driver of operational excellence rather than a source of disruption.

In 2026, leading health plans are plans are focused now, more than ever, on network adequacy and building strong provider relationships—and payment integrity can help.

Looking Ahead: Turning Payment Integrity into Competitive Advantage

As health plans navigate rising costs and increasing complexity, payment integrity is central to the solution. The most successful organizations will be those that move beyond siloed tools and embrace a connected, intelligence-driven approach, prioritizing transparency, prevention, and measurable outcomes.

HealthEdge Source is designed to support this evolution, helping health plans improve payment accuracy, reduce administrative burden, and act with confidence as conditions change.

Want to learn more about the opportunities for AI within payment integrity? Download our whitepaper: Elevating Payment Integrity – The Role of AI in Enhancing Payment Accuracy.