Data Science and Data Lakes in the Payer Space

September 8, 2023

Matt Kuntz

data lakes health plans | heathedge

Historically, HealthEdge has focused on optimizing the transactional side of the payer business. As a core administrative solution provider, we touch all parts of our customers’ workflow, and this requires us to store and host volumes of data. By better understanding the data, we can use it to drive value for customers.

With a data lake, any kind of data, irrespective of structure and source, can quickly provide valuable insights that improve our customers’ business outcomes and operations.

With a traditional data warehouse, users must transform data into a well-defined schema before storing it in the warehouse. In order to generate insights from the data, one is limited to the particular schema design. Furthermore, these traditional schemas face design challenges when new sources of data become available for ingestion.

With a data lake, there is no longer a barrier. The data does not need to be clean and perfect or come from a single source; it can come from anywhere. A data lake allows for the storage of data from core admin systems, pharma, EHRs, or other proprietary sources in its original format until it’s required for analysis. Furthermore, a data lake is scalable and can easily support large volumes of data at once or incrementally, enabling analysis that would not be possible with traditionally pre-defined hardware constraints. With the data lake’s distributed systems, a user can ask extremely complex questions as well as create computationally intensive predictive models.

For example, a model could be built to determine how to process claims more efficiently and improve auto adjudication rates using machine learning techniques. With a data lake, a user can perform complex data transformation of millions and millions of claims—including the claims history, adjustments, processing on reason codes, and more —and do it in a fraction of the time of a traditional SQL-based data warehouse.

A second example of leveraging data lake technologies for health plans is with predicting membership churn. Retaining members is a significant issue for health plans, but they can only compare the return rate versus the percentage of people leaving. With a data lake, there may be enough historical data to model member characteristics and behavior before they left the health plan in the past and use this knowledge to predict if current members will leave a plan in the future. With that information, health plans can adjust their offerings accordingly to improve retention rates.

This year, HealthEdge built a data science team that is currently pursuing these and other hypotheses. Through close collaboration with our customers and a series of near-term proofs-of-concept, we anticipate unlocking new types of value for the health insurance market not possible five years ago.