Healthcare graph

Linked, enriched and analytics ready, Compile’s smart data layer transforms messy and disparate datasets into an intuitive graph of healthcare providers and all their activities

Limitations with healthcare datasets today

A scalable platform to ingest, cleanse and link healthcare datasets; eliminate silos across data assets


Every large dataset, such as patient-level data (PLD), sales data and pharmacy claims have gaps due to partial or erroneous data fields. These need to be identified and cleansed before the data can be used.


Users of these datasets have to a steep learning curve to understand the data structure and build queries. Additionally multiple tables and a proliferation of fields require complex queries.


Presence of treating or associated providers (HCP/HCO) is critical in using claims-based data for any analytics. Yet these important fields aren’t filled in many instances, limiting provider and patient counts.


Joining multiple data assets isn’t straight forward. In the absence of a consistent key across all data, users have to build custom matching rules. This is time consuming and prone to errors.

Compile’s Smart Data layer

Unified, linked and analytics-ready data that gives a complete and accurate view of providers


Unified layout

Simplified claims data structure into one single comprehensive table Easy to understand and navigate Stitched valuable information like affiliations and provider metadata

Cleansed & enriched data

Cleaned, standardized and normalized data fields Robust process to remove any duplicate or invalid claims Advanced algorithms to standardize “free-form” fields like payer names

Enhanced data

Missing provider details and payer channel are backfilled based on historical claims patterns Pre-computed data fields for faster analytics and simpler queries Custom-built ML algorithms to identify precise brand usage

Analytics ready

Build, integrate and automate your analysis/dashboards/applications Pre-computed field and features for faster insights Reduced complexity in building and structuring queries/codes 60%+ lower query time


Only the important fields that are used for 95% of the use cases are retained One table each for medical and pharmacy claims Parent affiliations and metadata for providers, prescribers is included

Entities profiled

30B+ Medical & pharmacy claims
300M+ Patients
>60% Medical claims capture
7 yr Longitudinal claims
6M+ Affiliations
100+ Data sources
75K+ Clinical trials
1.1M+ Publications
$80B+ Company payments
50K+ Twitter profiles
600K+ Facilities
1,000+ IDNs