Executive Discussion: Preparing for the Future of Care Management with an Integrated Operating Model 

Part 1 of a 2-part series, where HealthEdge® Vice President of Product Development Bobby Sherwood discusses how a new operating model can transform care operations for health plans.

In this installment, we cover:

  • What is an ecosystem operating model?
  • Addressing workforce shortages with AI
  • Redesigning care delivery to improve outcomes

In part 2, Sherwood shares more about measuring success with AI-powered tools, steps to achieving operational transformation, and seamless data sharing.

Nobody goes to nursing school to push paper.

What if care managers could reclaim time currently lost to authorization reviews and compliance chases? Imagine redirecting those hours from administration and back to what really matters: engaging members, improving outcomes, and transforming care through meaningful relationships.

That’s the fundamental shift happening in healthcare now. Health plans are under unprecedented pressure from rising medical costs, regulatory demands, and workforce shortages. And the traditional approach—like layering on additional software and hiring more staff—isn’t sustainable. Something has to change.

At HealthEdge®, we’re delivering a different approach: an ecosystem operating model. We combine next-generation, AI-enabled technology with services designed to lower total costs—and we build contractual accountability into the model.

We sat down with Bobby Sherwood, Vice President of Product Development, to explore how this operating model goes beyond traditional software, and why measurable outcomes are raising the bar.

Defining the New Standard: What is an Ecosystem Operating Model?

How do you define an ecosystem operating model? And how is it different from traditional business process outsourcing?

Traditional business process outsourcing is essentially labor cost arbitrage—taking existing processes and doing them cheaper, often through offshoring or team scaling. You’re doing the same work, just with different people.

Our ecosystem operating model is fundamentally different. We’re reimagining the entire process through technology to deliver transformational outcomes, not just cost savings. It’s about operational transformation, not just labor efficiency.

Here’s what makes our approach unique: we combine HealthEdge’s integrated solution suite with clinical expertise and AI-driven automation. But most importantly, we take accountability for actual care outcomes. We’re not just providing technology. We’re building toward taking responsibility for moving the needle on medical cost trends, member satisfaction, and clinical quality measures.

That accountability changes everything. We succeed only if you succeed. The partnership dynamic is completely different from a traditional vendor relationship.

Supporting a Strategic Shift for Health Plans

HealthEdge is already delivering and optimizing an ecosystem operating model for health plan customers. What does that shift really mean for health plans, and why is now the right moment?

This evolution gives health plans unprecedented flexibility and choice within a full-stack, integrated ecosystem. Health plans can leverage delivery options to focus on which business metrics matter most for their specific strategies and do more with existing capabilities and resources.

Now is the right moment because the traditional vendor model has created fundamentally misaligned incentives. Historically, health plans license software through contracts that aren’t directly tied to member outcomes or total cost performance. While these tools can enable improvement, the commercial model itself isn’t structured around shared accountability for results.

Our ecosystem operating model reshapes legacy systems, and HealthEdge moves beyond vendor status to become a performance-aligned partner. This means our success is directly tied to delivering the specific clinical and financial outcomes that matter to each health plan. We’re accountable for results, not just software uptime. And frankly, the market is ready for this. Health plans are facing a perfect storm of pressures that’s making them more open to different arrangements.

Addressing Workforce Shortages with AI

Speaking of those pressures—rising medical costs, regulatory demands, workforce shortages—which one is forcing the biggest operational change right now?

If I had to pick one, it’s workforce shortages. Health plans can’t hire clinical staff fast enough to keep pace with member growth, stringent regulatory requirements, and the increasing complexity of care management.

This shortage is driving urgent interest in our approach. When you can’t solve the operational bottleneck by adding more people, you have to fundamentally reimagine how work gets done.

This is where AI technology within HealthEdge GuidingCare® becomes exceptionally powerful. Our system empowers health plans to address staffing shortages by automating routine tasks and enhancing clinical efficiency. Features like Automated Clinical Summaries and Intelligent Document Processing reduce administrative burdens, while Intelligent Care Guidance and Ambient Intelligence streamline decision-making and ensure documentation completeness. This allows health plans to redeploy clinical resources to high-impact activities like complex case management and member relationship building, improving care outcomes and operational efficiency.

Think about authorizations. Most nurses did not go into the profession to review paperwork. But authorizations take up so much time and budget because of the compliance burden. If we can demonstrate at scale that we’ll handle all of it, keep health plans compliant, guarantee the savings, and deliver the outcomes health plans need, nurses will gladly hand-off that work. Then health plans can shift those nurses to activities that truly impact the cost curve.

Redesigning Delivery for Measurable Impact

Historically, care management technology has focused on tools and workflows. How is HealthEdge’s Care Solutions approach different in terms of the outcomes it’s designed to deliver?

Traditional care management technology asks, “How can we make existing processes more efficient?” We ask, “What outcomes does the health plan need to achieve, and how do we redesign the entire care delivery model to get there?”

In traditional software models, vendors provide powerful tools that enable improvement, but responsibility for realizing the full value often rests primarily with the health plan. While renewals can reflect satisfaction over time, the commercial structure itself is typically based on access to technology rather than shared accountability for measurable outcomes.

When financial accountability is tied directly to outcomes, the dynamic changes entirely. You’re not just another vendor in their stack of dozens or hundreds of vendors. You become a true strategic partner, fully invested in your clients’ success, with mutually aligned incentives and shared accountability for real, measurable outcomes. This fundamentally transforms the engagement from a transactional relationship into a collaborative alliance built on trust, transparency, and joint achievement.

Our integrated platform doesn’t just digitize existing workflows. It reimagines them entirely. Instead of managing care through disconnected systems and manual processes, our platform enables seamless orchestration of member engagement, clinical interventions, and administrative processes. The result is measurable improvements in clinical quality, member experience, and cost management that we contractually guarantee.

Ready to learn more about reaching and engaging members, measuring the ROI of an ecosystem operating model, and how to get started?

Read part 2 of the blog here: Adopting an Ecosystem Operating Model and Measuring ROI.

About Bobby Sherwood

Bobby Sherwood is VP of Product Development at HealthEdge, where he leads strategic direction for the company’s cloud-based care management solutions and Business Process as a Service offerings. With deep expertise in healthcare technology and payer operations, Bobby works with health plans to transform care delivery models and drive measurable outcomes.

 

The Transformation Tipping Point: Why Health Plans Are Rethinking Operations 

Health plans are heading into 2026 under a level of pressure that feels fundamentally different from just a few years ago.

Findings from the HealthEdge® 2026 Healthcare Payer Survey Report, “The Great Rebalancing: Inside the New Realities Shaping Health Plan Performance,” paint a picture of an industry navigating intensifying regulatory demands, rising cost pressure, accelerating AI adoption, and growing gaps between strategy and execution.

Taken together, these signals point to something bigger than incremental change. The healthcare operating model itself is shifting, and many organizations are being forced to reconsider how their business is structured and how work actually gets done.

From Optimization to Reinvention

For years, many health plans focused on improving workflows, optimizing processes, tightening controls, and driving incremental efficiency.

That approach is starting to give way to something more fundamental.

The survey findings suggest that organizations are not just adjusting priorities. They are rethinking how their operations are designed altogether. Investments are being redirected, infrastructure is being modernized, and long-standing assumptions about how the business runs are being challenged in response to rising costs, regulatory complexity, and increasing expectations from both members and providers.

Cost management continues to dominate executive attention, while compliance requirements grow more complex and more visible. At the same time, initiatives tied to AI, automation, and digital engagement are gaining traction across the enterprise.

Strategies for Reducing Costs

 

Source: 2026 Healthcare Payer Survey Report, The Great Rebalancing: Inside the New Realities Shaping Health Plan Performance

Individually, these shifts are significant. Together, they point to a clear reality: incremental improvement is no longer enough. Health plans are being pushed toward structural reinvention.

Transformation Is the Strategy. Execution Is the Challenge.

While the ambition to transform is nearly universal, the ability to execute remains uneven.

HealthEdge’s survey highlights that:

  • 94% of payers are live with or adopting AI, yet only 31% report fully defined governance models
  • Executive leaders express more confidence in transformation progress than operational and regulatory teams
  • A perception gap persists — only 51% of members view their plan as a “partner in care,” compared to 76% of payers who believe they are perceived that way

What emerges is a pattern: transformation is moving quickly, but alignment across the organization is not keeping pace.

The Pace of AI Adoption at Health Plans

 

AI Governance Maturity at Health Plans

 

That misalignment shows up in different ways, including unclear governance, uneven confidence, and gaps between intent and experience. The question is no longer whether modernization is happening. It’s whether organizations are equipped to make it work at scale.

The Technology Partnership Pivot

As complexity increases, health plans are also rethinking how capabilities are built and sustained.

The survey points to a growing recognition that core functions, such as claims processing, payment integrity, and care management, cannot operate effectively in isolation. When these workflows are connected, organizations gain better visibility, reduce duplication, and create more consistent outcomes.

At the same time, automation is becoming more deeply embedded in day-to-day operations. When applied directly within claims and engagement processes, it has a measurable impact on administrative efficiency and administrative loss ratio (ALR) performance. Governance is evolving as well, with leading organizations, like HealthEdge, designing compliance and auditability into their systems from the start rather than addressing them after the fact.

All of this reflects a broader shift. The traditional model, which was built on fragmented systems and siloed processes, is giving way to something more coordinated, where data, workflows, and decisions are more tightly connected.

Why the Health Plan Pressure Is Increasing

Several forces are converging to accelerate this shift.

  • Regulatory Velocity – Compliance requirements continue to expand, with greater scrutiny on auditability, prior authorization, payment accuracy, and AI governance. New mandates are not only increasing oversight but also requiring faster, more transparent data exchange—forcing organizations to rethink both systems and processes.
  • Cost and ALR Pressure Administrative cost containment remains a constant focus. As margins tighten, leaders are looking more closely at automation, payment integrity, and operational efficiency—not as incremental improvements, but as essential levers for sustaining performance.
  • Member Expectations At the same time, expectations from members continue to evolve. The gap between perception and experience is notable: while most plans believe they are seen as partners in care, only about half of members agree. Closing that gap requires health plans to deliver experiences that feel connected, transparent, and trustworthy.

These pressures are not temporary disruptions. They reflect a longer-term shift in what it takes to operate effectively as a health plan.

How HealthEdge Is Helping Plans Navigate Industry Shifts

As organizations move beyond optimization, a consistent theme emerges: transformation only works when it is integrated, governed, and measurable.

HealthEdge is helping health plans make that transition by embedding AI and automation directly into core workflows, from claims adjudication to payment integrity and member engagement. At the same time, configurable governance controls and audit-ready frameworks support greater transparency and compliance.

By connecting pricing, editing, and review workflows, plans can reduce administrative burden while improving financial performance. And with unified data models and enterprise-wide visibility, leaders are better equipped to align strategy with execution.

The goal is not to layer new tools onto legacy systems, but to support a more durable shift in how operations are structured and managed.

A Trusted Partner to Embrace Ongoing Change

The findings from HealthEdge’s 2026 Healthcare Payer Survey point to a clear inflection point. Incremental improvement is no longer sufficient. Health plans are entering a period where structural change is required across technology, operations, and governance.

Those that succeed will be the ones that bring these elements together into a more connected, adaptable operating model.

Want to learn more about the ways health plan leaders are already integrating AI-powered tools into their workflows? Read our recent article, “Unlocking the Future of Healthcare Technology: Interoperability, Transparency, and AI”.

Building Trust in AI: A Guide to LLM Evaluations 

Large language models (LLMs) are inherently probabilistic, meaning the same input can produce different outputs. That variability makes traditional unit tests, which verify exact results, ineffective for AI systems. In healthcare, where quality and accuracy are nonnegotiable, this creates a unique challenge: how do you ensure AI performs reliably at scale? At HealthEdge, we address this through a multi-layered evaluation strategy that combines human evaluations, LLM-as-a-Judge, CI/CD automation, and online, real-time monitoring to meet healthcare’s rigorous quality standards.

Why do we need multiple evaluation types?

Each serves a distinct purpose in the AI development lifecycle:

  • Human evaluations establish ground truth. Only domain experts can judge whether an AI summary captures clinically relevant details or if generated test cases are actually executable. Humans define what “good” looks like.
  • LLM-as-a-Judge scales human judgment. We can’t have subject matter experts (SMEs) review every output during rapid development. A judge-LLM applies human-defined criteria consistently across thousands of examples, enabling fast iteration.
  • CI/CD regression evaluations prevent quality backslides. When prompts or models change, automated tests catch regressions before they reach production, which is essential when multiple teams ship AI features weekly.
  • Online (real-time) evaluations catch real-world drift. Production traffic contains edge cases that no test dataset anticipates. Continuous monitoring detects degradation before users complain.

We’ll illustrate each type of evaluation using our QA Test Case Generation Agent, which reads Jira tickets and generates test cases with titles, preconditions, steps, and expected results.

Human Evaluations

Human evaluations are the gold standard. For healthcare AI, human oversight is non-negotiable. AWS Bedrock supports this through human-based evaluation jobs: collect inference examples, upload to S3, create evaluation jobs with custom metrics, and review results through Bedrock’s console.

SMEs are best suited for measuring the performance of highly complex operations. For instance, the QA Test Generation Agent takes in a nontrivial input, a Jira ticket, and outputs an entire spreadsheet of test cases with multiple test steps. It takes a multitude of steps to translate from input to output, all of which simulate the role of a QA engineer.

LLM-as-a-Judge

LLM-as-a-Judge uses a second LLM to evaluate primary agent outputs, scaling human-like judgment across large datasets without requiring SME time for every evaluation run.

Each evaluation metric is defined by a prompt that instructs the judge LLM what to assess and how to score. For example, a “Relevance” evaluator prompt asks the LLM to compare the generated output to the source input and rate how relevant the response is. These evaluation prompts can be customized for domain-specific criteria, allowing teams to encode their quality standards into reusable, automated checks.

When initially building LLM-as-a-Judge evaluators, it’s helpful to compare their scores against human evaluations on the same dataset. This calibration ensures the LLM evaluators resemble SME judgment as closely as possible. If the judge LLM scores differ significantly from human reviewers, the evaluation prompt needs refinement until alignment improves. Bedrock offers built-in evaluators for correctness, relevance, and hallucination, as well as custom prompts.

For the QA Test Generation Agent, the same criteria evaluated by SMEs can be provided as prompts for the LLM judges, providing a secondary aggregate of metrics. Acting as a baseline, they can indicate any dips in performance of the agent.

CI/CD Regression Evaluations

CI/CD evaluations automate quality gates. When developers merge changes to prompts, models, or agent architecture, automated evaluations catch regressions before they move into production.

AWS AgentCore integrates with GitHub Actions to configure datasets, define LLM-as-a-judge evaluators, and specify task functions. The pipeline triggers on the merge, blocking deployment if thresholds aren’t met.

For example, with the HealthEdge QA agent, we block if test comprehensiveness, as evaluated by an LLM as a comparison with ground truth data, drops below 80% or CSV output format adherence falls below 95%.

Online (Real-Time) Evaluations

Online evaluations monitor production traffic, sampling live requests to detect drift that static datasets miss. These evaluations use the same LLM-as-a-Judge evaluators defined during development, applying them continuously to production data rather than pre-constructed test sets. AgentCore supports configurable sampling (1-5% of traffic), running judge prompts on sampled requests and surfacing score trends through observability dashboards. If quality degrades from unexpected inputs, online evaluations catch it before users report issues.

The Evaluation Lifecycle

These four evaluation types form a continuous loop: human evaluations establish ground truth; LLM-as-a-Judge enables rapid iteration; CI/CD gates releases; online monitoring feeds edge cases back into development.

AI evaluation requires fundamentally different approaches than traditional testing. By combining human evaluations, LLM-as-a-Judge, CI/CD automation, and real-time monitoring, HealthEdge ensures AI features meet healthcare’s quality standards.

To follow HealthEdge’s AI strategy in greater detail, visit the Resources section of our website, www.healthedge.com.

Cut Administrative Costs Up to 40% With an Integrated Operating Model 

For the second consecutive year, 52% of health plan executives named “managing rising costs” as the top challenge, according to the 2026 HealthEdge® Annual Payer Report.

In the same survey, 85% of executives reported that regulatory compliance requirements directly cut into their margins. Health plans using HealthEdge HealthRules® Payer enter this environment with the advantage of a next-generation core administrative processing system (CAPS) designed specifically for these challenges. The next opportunity is to expand that advantage across your organization’s workflows by adopting an integrated operating model.

From Next-Generation Software to a Modern Operating Model

HealthRules Payer delivers capabilities that legacy systems cannot match, including rapid configuration, high auto-adjudication rates, and the flexibility to quickly adapt to regulatory changes.

Across the industry, health plans that have modernized their CAPS still operate within traditional organizational structures and practices. Internal teams, external vendors, and contracts are organized by function, each with its own disparate accountability and performance tracking. That model leaves significant operational value on the table because of persistent inefficiencies like:

  • Low visibility & fragmented accountability. Separate teams and vendors manage claims, enrollment, billing, and member services, each under distinct contracts and using unique performance tracking. This reduced transparency prevents cross-functional collaboration and efficiency.
  • Downstream lag on configuration changes. The platform adjusts rapidly, but the operation around it may take weeks to catch up. Retraining staff, updating enrollment procedures, and revising compliance documentation all follow separate timelines.
  • High administrative costs. Without operational transparency and integration, staffing overhead and managing multiple vendors can dilute the efficiency gains HealthRules Payer delivers.
  • Compliance execution that remains reactive. Each regulatory change triggers a cross-team operational scramble managed as a special project rather than an adaptable, repeatable workflow.

These are operating model challenges and, left unaddressed, they compound. Technical debt grows, capital stays locked in maintenance, and the ability to expand into new markets or new lines of business narrows. Overcoming these challenges requires integrated platforms that can handle rapid modernization to deliver outcomes.

An Integrated Operating Model with Guaranteed Financial Outcomes

Moving to a modernized operating model is a structural change. Software ownership gives way to outcome accountability. Siloed systems and multiple vendor relationships give way to an end-to-end operating ecosystem managed by a single partner.

The HealthEdge integrated operating model consolidates technology, operational processes, and a dedicated global service delivery team under a single governance structure and set of service-level agreements (SLAs). The partner who builds the platform is also the one who operates it and is accountable for the results, with contractually guaranteed cost reductions tied to defined performance measures.

This is a different approach than patching point solutions onto an existing environment or layering new tools on top of an outdated operating structure. Those practices may address some symptoms, but they can add costs and do not change the underlying model. An integrated ecosystem replaces the model itself.

Across current deployments, health plans operating within this model have achieved:

  • 30–40% reduction in administrative cost per member per month (PMPM), sustained year-over-year.
  • Return on investment (ROI) within 12–15 months, compared to 24–36 months under multi-vendor approaches.
  • Budget variance below 1%, against an industry benchmark of 10–15%.

With HealthEdge, these are contractual commitments—not projections.

Turning a Technology Advantage into an Operational One

For health plans already operating on HealthRules Payer, the integrated operating model means having a dedicated team with deep expertise in the platform, acting as an extension of your organization to ensure that investment works harder across every function.

One Accountable Partner, End to End

HealthEdge owns the platform, the global operations delivery team, and the outcomes. Vendor sprawl and split accountability are replaced by a single point of responsibility for performance, backed by shared risk. Ongoing platform upgrades and regulatory updates are included at no additional cost.

Configuration Changes That Execute at Platform Speed

When a benefit rule, pricing change, or regulatory update is configured in HealthRules Payer, the downstream operational response is managed by one team at a rapid pace. New lines of business that take months under traditional models can launch in days.

Administrative Costs Decrease and Stay Down

Standardized work processes, automated task processing, and economies of scale across a global delivery team of more than 7,000 drive 30–40% reductions in health plan administrative costs, sustained year over year. Savings are tied to process enhancement rather than headcount reduction. You pay for measurable results, not employee hours or resource inputs.

Compliance Built into Daily Operations

Centers for Medicare and Medicaid Services (CMS) mandates like the Transparency in Coverage final rules, state requirements, and audit readiness tools are embedded into standard workflows within HealthRules Payer. Plus, regulatory changes are operationalized as they arrive.

Real-Time Visibility Across the Full Operation

A centralized data hub in the platform surfaces performance data through unified executive dashboards. Backlogs, SLA risks, and compliance exposure become visible before they become problems.

Leadership Reclaims Time for Strategy

When a single partner manages daily execution and owns the outcomes, health plan leadership moves capacity toward growth and member experience.

The HealthEdge advisory team works alongside health plan leadership from day one to ensure operational continuity.

The New Operating Model in Practice

One regional health plan moved to an integrated ecosystem model, unifying core administration, claims, enrollment, billing, correspondence, and member services under a single HealthEdge operating structure in fewer than 12 months.

The payer’s administrative spend dropped from $11.50 to $6.90 per member per month (PMPM), a reduction of approximately 40%. Claims accuracy reached best-in-class levels, and the plan reported EBITDA uplift across every function in the model.

Operational improvements like these are still the exception among health plans, though they are available to any organization. Adopting a fully integrated technology-and-operations model can help put your health plan at the forefront of the adoption curve, giving your organization a competitive advantage.

Learn more about the power of an integrated operating model in action. Download our case study to see how one health plan successfully migrated its entire ecosystem under pressure and achieved a 99.8% auto-adjudication rate.

A Practical Way to Get Started Now

HealthEdge offers a cost-driver assessment that builds a baseline of current administrative spend, identifying where costs originate and where an integrated model would generate the greatest return. Whether a health plan moves forward with HealthEdge or not, the assessment delivers an independent, practical view of the current operation.

Contact us to learn more.

Preparing for Software Testing: 8 Best Practices for Health Plans 

When a health plan undertakes a major technology change, whether implementing a new platform, modernizing a legacy system, or rolling out new functionality, the promise is compelling: streamlined workflows, greater automation, and more time for teams to focus on strategic priorities.

Before those benefits can be realized, however, there is a critical step that determines whether the transition succeeds or struggles: User Acceptance Testing (UAT).

For many health plans, UAT is unfamiliar territory. Others may not have gone through a large-scale testing effort in years. In either case, preparation is key. This article draws on the experiences of the HealthEdge® Global Professional Services testing team to deliver eight best practices that help payers prepare for a successful testing engagement and a smoother go-live.

What Is User Acceptance Testing (UAT)?

UAT is the final phase of the software testing process. It’s where business users (not technical teams) validate that the system meets business requirements and is ready for day-to-day operations in a production environment.

It is not about proving the software works technically. It’s about confirming that the solution supports real-world workflows, produces accurate and compliant outcomes, and enables users to do their jobs effectively.

Why UAT Matters for Health Plans

UAT plays a critical role in reducing risk during major technology changes, enabling health plans to:

  • Confirm that business requirements are met
  • Validate end-to-end workflows across teams and systems
  • Identify gaps missed in earlier testing phases
  • Reduce the likelihood of costly post–go-live issues
  • Support compliance and audit readiness
  • Build user confidence and encourage adoption

Most importantly, UAT provides health plans with one final opportunity to ensure readiness before the new system goes live.

The Health Plan’s Role in the Testing Process

While every software development is unique, users play a central role in ensuring the solution works as intended in real-world use. In most testing engagements, the health plan’s responsibilities include:

  • User Acceptance Testing: Leading business validation to confirm the system supports operational needs
  • Providing Test Data and Access: Supplying realistic data and user credentials for testing
  • Business Requirements Validation: Confirming that configured workflows align with business expectations
  • Final Sign-Off: Approving the solution for production following successful UAT

Although users are most active during UAT, effective testing starts much earlier. Early involvement during definition of requirements, design, test planning, and data preparation significantly improves UAT outcomes.

8 Leadership Decisions That Set the Stage for a Successful Technology Change

Major technology implementations are rarely derailed by software issues alone. More often, challenges arise when organizations underestimate the preparation required to validate new ways of working before go-live.

Successful testing is not accidental. It is the result of deliberate leadership decisions made well before UAT begins. Leaders who approach testing as a strategic business exercise, rather than a technical checkpoint, put their organizations in a far stronger position to realize value from their investment.

The following eight practices represent the most important actions leaders can take to ensure testing supports a smooth transition, confident users, and long-term success.

1. Understand the Purpose of UAT

Testing is not about finding every possible defect. The goal of UAT is to ensure the system will support your business operations once real users depend on it.

During UAT, business leaders and users should be asking:

  • Can users do their jobs effectively in the new system?
  • Do core processes work from start to finish?
  • Are outcomes accurate, compliant, and usable?
  • Is the system intuitive for different user roles?

Keeping this purpose in focus helps teams prioritize what truly matters.

2. Involve the Right People Early

The people validating the system should be the people who will use it, not just technical resources or project team members.

Health plans should consider involving:

  • Frontline users who understand day-to-day work
  • Subject matter experts familiar with exceptions and edge cases
  • Supervisors or leads who understand downstream impacts
  • Compliance, audit, or quality representatives
  • Data owners
  • Business UAT Lead

These stakeholders should be engaged early, during requirements definition, test planning, and data preparation, not only during UAT execution.

3. Protect Time for User Acceptance Testing

One of the most common challenges in UAT is underestimating the time it takes. When testing is treated as an “extra” task layered onto daily responsibilities, quality suffers.

Best practices include:

  • Allocating dedicated time for UAT participants
  • Reducing or temporarily backfilling day-to-day responsibilities
  • Setting realistic timelines for testing and retesting
  • Treating UAT as a priority business activity

Strong UAT requires an upfront time investment—but that investment pays off through smoother go-lives and fewer post-production fixes.

4. Prepare Realistic Scenarios

Effective testing goes beyond validating individual system functions. UAT should test scenarios inspired by users’ daily workflows. For example, rather than only validating a single calculation or rule, an end-to-end scenario might include logging in, accessing a member, completing an assessment, creating a care plan, and triggering follow-up tasks.

Prioritize scenarios that are:

  • High-volume or frequently used
  • High-risk from a compliance or financial perspective
  • Critical to member or provider satisfaction

These scenarios provide the most meaningful validation of system readiness.

5. Ensure Data and Configuration Are Ready

UAT is only as effective as the data supporting it. Health plans should ensure test data is realistic, complete, and accurately configured before testing begins.

This typically includes:

  • Member demographics and eligibility
  • Provider and program information
  • Role permissions and workflow configurations
  • Negative and edge-case data (such as members with no eligibility or incomplete documentation)

Poor or incomplete data can delay timelines, mask defects, and undermine confidence in testing results.

6. Train Users for UAT—But Don’t Turn It Into Full-Scale Training

Users don’t need to be system experts to test effectively, but they do need enough familiarity to execute workflows and recognize whether outcomes are correct.

Before UAT begins, ensure users can:

  • Understand the business processes they are testing
  • Navigate the system for their role
  • Enter, edit, and validate data
  • Follow test scripts and document results

Many organizations find that walking through test scenarios provides valuable hands-on learning without turning UAT into full-scale training.

7. Set Clear Expectations for Issue Management

Clear guidelines for logging, prioritizing, and resolving issues are essential to keeping testing on track.

Teams should align on:

  • What constitutes a critical issue versus a minor one
  • How and where issues are logged
  • Who determines whether an issue must be resolved before go-live
  • Communication and escalation paths

Without clear issue management processes, testing can stall, defects may be missed, and go-live decisions become more difficult.

8. Don’t Rush—or Skip—User Acceptance Testing

UAT is the only phase where real business users validate that the system supports their workflows, rules, and daily operations.

When UAT is rushed or skipped, organizations face significant risks, including:

  • Untested critical workflows
  • Higher likelihood of production defects
  • Increased project costs
  • Compliance and operational disruptions

Taking the time to complete UAT thoroughly helps protect both the organization and the users who rely on the system.

Reducing Risk and Realizing Value Faster

UAT is one of the most important milestones in any major technology change. While it’s easy to get caught up in individual defects or system nuances, the real purpose of UAT is far more strategic: to confirm that the organization is ready to operate with confidence in the new environment.

When UAT is done well, health plans gain assurance that core business processes function as intended, users can perform their roles effectively, and financial and compliance outcomes are accurate. Most importantly, it provides leadership with the confidence that the organization is prepared—not just to go live, but to succeed once the system is in production.

HealthEdge: Your Partner in Testing Success

Health plans don’t have to navigate this process alone. Experienced software and services partners like HealthEdge bring proven frameworks and expertise to guide health plans through all phases of testing, from data preparation and scenario design to execution, automation, and issue management.

Engaging the right partner early in the process helps reduce risk, accelerate readiness, and ensure that testing supports a smooth transition and long-term value from the technology investment.

See how our Global Professional Services team partners with health plans to plan, execute, and optimize testing engagements, helping teams go live with confidence and realize value faster. Read the first article in our Software Testing series, “Software Testing Essentials: Why It Matters for Health Plans and How HealthEdge® Makes It Easier.”

Building Trust in LLM Solutions: A Practical Guide to Evaluation Planning 

Artificial intelligence (AI) is fundamentally changing how healthcare software is built. From automated test case generation to intelligent documentation and decision support, large language models are becoming embedded within the software development lifecycle itself.

As AI becomes part of how solutions are designed and validated, the question is no longer just whether it adds efficiency. It’s whether organizations can systematically evaluate and trust the outputs it produces.

At HealthEdge®, we’re deploying the Wellframe QA team’s test case generation agent. The agent takes Jira tickets for new front-end functionality, including acceptance criteria, and generates test cases as CSV files for a downstream test management tool. This collaboration has demonstrated that successful LLM deployment requires building trust through rigorous evaluation.

What Are LLM Evaluations?

Traditional software applications are straightforward to assess with clearly established patterns: unit testing, integration testing, UAT, etc. LLM applications are different: infinite output possibilities, context-dependent responses, and subtle failure modes.

LLM evaluations systematically measure whether your LLM application solves the problem you built it to solve. They provide concrete evidence of what works and reveal specific areas that need improvement.

Evaluations serve different audiences with different needs.

  • For stakeholders, they provide transparency and set realistic expectations about what the system can and cannot do.
  • For developers, they highlight specific shortcomings that need attention and help prioritize improvement efforts.
  • For users, they build confidence that the system has been rigorously tested.

The ultimate goal is trust. Users need to trust that your LLM solution will perform reliably. Evaluations are how you earn and maintain that trust.

The Four Components of a Robust Evaluation Plan

Our QA test generation agent presents a complex evaluation challenge. Given a Jira ticket, it generates test cases with sections, titles, preconditions, steps, expected results, and metadata. There’s no single correct output, and quality is multidimensional.

Consequently, we devised a complete evaluation plan with four components: criteria, methods, dataset, and execution strategy.

Component 1 – Evaluation Criteria: Criteria should stem directly from the problem the model is solving. For our QA test generation agent, we identified multiple critical criteria based on what makes test cases valuable to our QA team:

  • Required Test Recall measures comprehensiveness. Are we generating all the necessary test cases that a human QA engineer would write? We calculate this as the number of “required” test cases covered by the agent divided by the total number of required test cases a human would write. We set a realistic target recall based on task complexity and risk.
  • Acceptance Criteria Coverage measures thoroughness. Does the generated test suite adequately test all the acceptance criteria mentioned in the Jira ticket? We target 90%+ coverage to ensure nothing slips through the cracks.
  • Test Comprehensiveness involves human evaluators to score on a 1-5 scale based on their holistic judgment of the test suite’s quality.

Each criterion targets a specific aspect of quality that matters to our end users (the QA team). We’re measuring concrete traits that determine whether the agent provides real value.

The key is to cover all bases and edge cases from different angles. A test suite could score high on recall (finding all the important scenarios) but low on coverage (missing acceptance criteria details). Both matter, so we measure both.

Component 2 – Evaluation Methods: The HealthEdge team pursued three approaches:

  1. Automated computable metrics (exact match, fuzzy match) work when success is mathematically defined.
  2. Human evaluation handles judgment requiring domain expertise.
  3. LLM-as-a-judge uses another LLM to evaluate based on a rubric.

For this project, we used automated checks for format and human subject matter experts (SMEs) for quality assessment.

Component 3 – The Evaluation Dataset: This is the most critical component. If the dataset doesn’t match production, the process will miss problems. For example, a resume-screening tool designed to evaluate software engineer resumes only might fail on designer or marketer resumes in production. Evaluation datasets must follow three rules:

  1. Representative means it reflects the actual distribution of cases you’ll see in production. If 60% of production tickets describe UI features, 30% describe API changes, and 10% describe infrastructure work, your evaluation dataset should match those proportions. If edge cases happen 5% of the time in production, they should appear roughly 5% of the time in your dataset.
  2. Diverse means covering the full range of scenarios, including edge cases and failure modes. For our QA agent, we need Jira tickets that vary in complexity (simple bug fixes vs. major features), clarity (well-written vs. vague requirements), and completeness (detailed acceptance criteria vs. minimal descriptions). Each variation might affect output quality differently.
  3. Consistent means the ground truth labels or expected outputs are reliable and reproducible. If three QA engineers evaluate the same test cases, they should largely agree on what’s required and what’s comprehensive. Inconsistent ground truth means you’re measuring noise instead of signal.

For this project, the Wellframe QA team curated a substantial dataset of real Jira tickets spanning different feature types and created the “required” test cases for each. This gave the team reliable ground truth to measure against, as it was built by the very subject-matter experts who will be using the agent in production.

Component 4 – The Execution Plan: Evaluations can be offline (using your dataset), which is comprehensive and controlled, or online (monitoring production), which catches unexpected inputs but often lacks ground truth.

For our QA agent, we chose the offline evaluation because our criteria require human subject matter experts. The strategy focused on periodic manual reviews conducted every few weeks during development. Before releases, a comprehensive evaluation served as a quality gate. In the post-deployment phase, the team focused on continuous monitoring.

Putting It All Together 

To recap our successful process, the critical steps we followed included defining concrete criteria, choosing appropriate methods, investing in a high-quality dataset, and designing an execution plan. For our QA agent, we accepted that evaluation requires human SMEs. We prioritized offline evaluation and invested in a diverse dataset with ground truth.

The result: A confident deployment with evidence of strengths and visibility into limitations.

Contact HealthEdge to learn how our AI solutions are reinventing the way our software solutions are being designed and tested.