Skip to main content

Building a Scalable OCR Pipeline: Technical Architecture Behind HealthEdge’s Document Processing Platform

In our first blog, we explored how HealthEdge’s AI-powered optical character recognition (OCR) platform is transforming prior authorization and other document-heavy workflows. Now, we’re taking you behind the scenes to show how we built it.

Creating an enterprise-grade OCR platform for healthcare requires more than just text extraction. It demands a sophisticated architecture that can handle diverse document types, maintain compliance standards, and scale to process thousands of documents daily. At HealthEdge®, we built our AI Platform’s OCR solution around a modular, three-stage pipeline that balances flexibility with reliability across multiple healthcare workflows.

The first product built on this platform is a solution for processing Prior Authorization forms. You can read more about it at: Transforming Healthcare Document Processing: How HealthEdge’s AI Platform Revolutionized Prior Authorization with Intelligent OCR. While the last article detailed the use case, this article will focus more on the technical architecture.

Multi-Stage Processing Architecture

Our OCR platform implements a three-stage approach: classification, extraction, and resolution. This modular design allows us to optimize each stage independently while maintaining flexibility for different document categories and use cases.

In this section, we will take a closer look at each stage in the multi-stage architecture upon which the OCR platform is built.

Robust Classification

The heart of our system lies in these configurable document categories that serve as processing blueprints. This enables us to define strategies for each document category and run dedicated models. This targeted approach to extraction enables a more accurate and fine-tuned result as opposed to generalized models. Classification also allows different Resolve stages, during which the output data format can be different between document types. That is, this allows fields to be added/omitted depending on the source document type. Fallback mechanisms can also be implemented to handle edge cases when documents can’t be classified with sufficient confidence. Most of this functionality can be quickly reconfigured to new document categories without code changes.

In our configuration for prior authorization forms, the classification layer uses Azure Document Intelligence Custom Classifier Models to intelligently route documents to appropriate processing workflows. The classifier is trained on a small handful of example documents to determine which standard Prior Authorization Form was provided.

We support multiple extraction approaches to handle the varied nature of healthcare documents. Our General Key-Value Extraction uses Azure’s prebuilt layout model with keyValuePairs functionality, where an LLM processes the raw output according to user-defined schemas. For example, the strategy can pull out data like name, phone number, and member ID, but may also capture extraneous data. The LLM is then prompted to filter this set of rough data pairs to conform to a clean user-defined schema.

Flexible Extraction Strategies

This approach requires no training but may extract unnecessary information that needs filtering, as the general extraction model will extract all possible interesting pairs of data from the document that may or may not be relevant, and the LLM can be prompted for 0-shot data filtering to only the specified subset of needed data. For more precise results, our Custom Extraction strategy leverages Azure’s Custom Extraction Model trained on user-labeled documents, where the user manually gives samples of the extraction results. While this requires a minimum of five labeled training documents and training times that can vary from minutes to hours, it provides high accuracy for relevant fields with comprehensive confidence and location metadata.

For simpler use cases, we offer Content Understanding through Azure’s service with custom analyzers trained via schema definitions. This service uses multiple LLMs that are tasked with understanding the document and picking out the user requested data. This service also cross-validates the results across multiple LLMs to ensure confidence and accuracy. This is easy to configure but provides limited location and confidence data for complex fields like tables. Our Markdown Extraction approach converts documents to markdown text and uses LLMs for field extraction. While cost-efficient and flexible, it provides no location or confidence metadata, though we can enhance it with two-stage processing for better accuracy.

Deterministic Resolution

The configuration process involves providing training data with document examples and their expected output. Once generated, this code provides consistent and repeatable results, eliminating the variability inherent in LLM-based approaches. For organizations requiring maximum predictability in their document processing workflows, this deterministic approach offers significant advantages over typical AI-based resolution methods.

Production-Ready Integration Architecture

Our platform adheres to an API-first design philosophy, exposing REST endpoints for each processing stage, including document classification, field extraction, result resolution, and code generation for deterministic mapping. Production deployments typically use automated file watchers that detect new documents in configured source locations, trigger the processing pipeline with proper tenant identification, handle background processing through all three stages, use queue-based messaging for completion notifications, and deliver results to designated output locations.

The platform handles multi-tenancy through tenant isolation in data processing and storage, configuration inheritance with customer-specific overrides, comprehensive audit logging with tenant attribution, and role-based access control. This architecture enables us to serve multiple healthcare organizations from a single platform instance while maintaining strict data isolation and security boundaries.

Performance and Reliability Characteristics

Our background processing architecture enables horizontal scaling without impacting user-facing performance. The platform can process thousands of documents simultaneously while maintaining consistent response times for interactive operations. Each extraction includes confidence scores that enable intelligent fallback strategies, including threshold-based routing for low-confidence extractions, human review queues for validation requirements, automated reprocessing with alternative strategies, and comprehensive logging for debugging and optimization.

Security and compliance are built into the technical architecture. We maintain HIPAA-compliant data handling throughout the processing pipeline, generate comprehensive audit trails for every processing step, ensure no autonomous actions occur without human validation, and implement encrypted data transmission with secure storage protocols. This technical foundation ensures that healthcare organizations can trust the platform with sensitive patient information while meeting regulatory requirements.

Real-World Use Cases and Applications

The platform’s versatility is demonstrated through a diverse range of healthcare applications, currently in production and planned for development. Our primary use case involves prior authorization processing for GuidingCare®, handling fax forms to extract patient information, medication requests, service codes, and diagnosis details from various payer-specific forms. We’re expanding into provider demographics management through existing infrastructure, processing provider update forms with demographic changes and credential modifications.

Beyond these current deployments, the platform’s modular architecture supports appeals processing with complex narrative sections, care management documentation including treatment summaries and discharge planning forms, and claims processing workflows handling both standard forms like CMS-1500 and payer-specific formats. The system’s technical versatility extends to multi-language healthcare forms, handwritten clinical notes, and mixed-format documents that combine structured fields with narrative sections.

The platform excels in scenarios requiring seasonal volume fluctuations, such as open enrollment periods and regulatory reporting deadlines, while enabling rapid new customer onboarding through configurable document types. This flexibility allows healthcare organizations to process everything from utilization management workflows to quality assurance documentation and member enrollment forms using the same underlying technical infrastructure.

This architectural approach demonstrates how thoughtful platform design enables both flexibility and reliability in healthcare document processing. By building modular, configurable systems with multiple processing strategies and robust security measures, we’ve created a foundation that can scale across diverse use cases while maintaining the accuracy and compliance standards essential for patient care. The result is a platform that doesn’t just solve today’s document processing challenges but provides the technical foundation for tomorrow’s healthcare automation needs.

To learn more about HealthEdge’s AI-first strategy, visit the AI blog series on our website.

About the Author

Ethan Zhu is a senior machine learning engineer at HealthEdge. He completed his master’s degree at the University of Toronto in Canada focused on representation learning. He previously did research with the Hospital for Sick Children on analysis of EEG and imaging data on neonatal children. He is inspired to leverage his skills to help those most vulnerable. He loves everything tinkering like dry aging beef, growing a garden, and CNC fabrication.