Skip to cookie consent Skip to main content

Healthcare Data Integration - Making data accessible for scientific discovery

Facilitating collaborations of researchers of diverse expertise via the Center’s environment and infrastructure is what we offer.


  1. Build a data infrastructure that links a variety of long-term outcome data sources to electronic health records (EHR) from Mass General Brigham and make them available to Mass General Brigham researchers for scientific discovery.

  2. Build an administrative infrastructure that ensures cost-effective use of the linked long-term outcome data sources in compliance with the data use agreements for the third party-owned data.

  3. Ensure data quality, integrity, and security of the linked long-term outcome data.

  4. Support and facilitate high-validity analytics and research implementation in line with state-of-the-art methodologies.

  5. Facilitate collaborations of researchers of diverse expertise via the Center’s environment and infrastructure.​

Use cases

The linked data provides rich information to determine clinical phenotypes and enable efficient recruitment of eligible subjects. The database also provides comprehensive data for long-term outcome ascertainment even if they occur outside the Mass General Brigham EHR system.

To compare the effectiveness and safety of new diabetes medications, claims data and EHR data can complement each other to provide a wealth of information. Claims data have sufficient size and full capture of both medication exposure and health outcomes, which may happen outside of the Mass General Brigham EHR system. EHR data provide information on BMI, Hemoglobin A1C results, duration of diabetes, and renal function estimated by laboratory test results. Together, large-scale populations can be studied with this detailed clinical information.

Much of drug safety and effectiveness research is conducted using claims data. Many claims-based algorithms for outcome ascertainment or patient phenotyping need to be validated against the gold-standard, established via EHR through chart review and/or confirmation by testing/laboratory results. For example, we have used the linked EHR-claims data to develop and validate an algorithm to identify patients with reduced vs. preserved ejection fractions (EF) in claims data. The gold-standard was established based on echocardiogram or other cardiac imaging available in Mass General Brigham EHR data.

(Highly stratified treatment effect evaluation)

To precisely characterize individual patients, we need the EHR system, which contains information on relevant genetic testing and biomarker profiles. To study how these factors influence effectiveness of medical interventions, we need claims data to assess longitudinal medication use patterns and clinical outcomes that frequently happen outside the Mass General Brigham EHR system.

EHR systems record the provider’s prescribing information. However, it is known that 20-60% of prescriptions are never filled and for many medications, adherence is often as low as 50% within 6 months of initiation. Claims data, which provides information on actual filling of prescriptions, provide a more complete picture of utilization patterns, switching, and discontinuation. This can be critical when disentangling the effect of the drug from optimal use and evaluating need and effectiveness of adherence improvement strategies.

EHR data enable better characterization of natural history of diseases defined by biomarkers or genetic polymorphism, while claims data provide long-term follow-up to characterize how the disease state of these patients changes and how care patterns adjust

Since claims data record all professional health services that resulted in insurance payments, they suit themselves well to describe longitudinal utilization patterns as well as associated costs. This allows us to assess the resource use and economic burden of specific diseases.

Data overview

RPDR is a centralized clinical data registry that contains data from EHR in 2 tertiary medical centers (Brigham and Women’s Hospital and Massachusetts General Hospital) and 4 community hospitals (Faulkner Hospital, Newton Wellesley Hospital, North Shore Medical Center, and Wentworth-Douglass Hospital), 3 specialty hospitals (Dana-Farber Cancer Institute, McLean Hospital, and Massachusetts Eye and Ear), a Rehabilitation center (Spaulding Rehabilitation Hospital), and >35 primary care centers within the Mass General Brigham system. The EHR databases contain information on patient demographics, medical problems, medications, vital signs, smoking status, body mass index (BMI), immunizations, laboratory data, and various clinical notes, documents, and reports. Read more about the RPDR

These contain information on demographics (age, gender, race/ethnicity, etc.), enrollment start and end dates, dispensed medications, performed procedures, medical diagnoses, cognitive function assessment, behavioral disturbance symptoms, functional status impairment, and patient-reported symptoms.

NDI is a centralized database established by National Center for Health Statistics (NCHS) based on death record information on file in state vital statistics offices. It contains information on vital status, including date and cause of death.