Complications and Deaths — Hospital — Methodology

Provenance

Dataset ID: hosp-complications
Entity Type: hospital
Role: enrichment
Source: CMS
Vintage: FY2026
Entity Count: 5,399
Last ETL Run: 2026-04-13

Overview

The Complications and Deaths — Hospital dataset is published by the Centers for Medicare & Medicaid Services (CMS) as part of the Hospital Compare program, now integrated into the Care Compare initiative on data.cms.gov (Provider Data API identifier ynj2-r877). It contains hospital-level risk-adjusted rates for 30-day mortality (MORT-30 measures for conditions including acute myocardial infarction, heart failure, pneumonia, COPD, stroke, and coronary artery bypass graft surgery) and complication rates (including PSI-90 Patient Safety and Adverse Events Composite). Each row represents one measure for one hospital, with fields for the risk-adjusted rate, national comparison category, number of patients, denominator estimate, and measure start/end dates. The current file covers FY2026, using a 3-year measurement window of approximately July 2019 through June 2022.

This dataset answers questions such as: which hospitals have higher- or lower-than-expected mortality rates for specific conditions, how a hospital's complication rates compare to the national average, and whether a hospital is categorized as "Better than the National Rate," "No Different than the National Rate," or "Worse than the National Rate" for each measure. The comparison categories are determined by whether the 95% confidence interval for the hospital's risk-adjusted rate overlaps the national rate. The data covers Medicare fee-for-service discharges from IPPS-participating acute care hospitals across the United States and its territories.

Join Strategy

This dataset joins to hospital entity pages on CareGraph using the Facility ID field, which contains the CMS Certification Number (CCN) as a 6-digit zero-padded string (e.g., 010001). During ETL, the normalize_ccn() function strips whitespace and zero-pads values shorter than 6 characters to ensure consistent matching. The generic _load_measures_by_ccn() loader reads the source CSV, identifies the CCN column using a candidate-list strategy (checking "Facility ID", "Hospital CCN", "Provider Number", and variants), and groups all measure rows by normalized CCN. Each hospital's measure-level records are attached to its JSON manifest under the complications_deaths key as an array of per-measure objects. Non-numeric values such as "Not Available" are filtered out during loading; numeric fields are parsed via _try_float(). The join is a left join from the hospital manifest — hospitals without Complications and Deaths records retain their existing data and display missing indicators for this dataset.

Known Limitations

Small-volume suppression. Hospitals with fewer than 25 cases in the denominator for a given measure have their rates suppressed, displayed as "Too Few to Report." This disproportionately affects small and rural hospitals, and suppressed entries should not be interpreted as zero mortality or zero complications.
No socioeconomic adjustment. Risk-adjusted rates use CMS hierarchical logistic regression models that adjust for patient age, sex, and clinical comorbidities from claims data. The models do not adjust for socioeconomic status, race, or hospital structural characteristics. Safety-net hospitals serving high-poverty populations may show systematically higher rates that reflect social determinants rather than clinical quality.
Medicare fee-for-service only. The dataset covers only Original Medicare (fee-for-service) discharges. Medicare Advantage enrollees, Medicaid-only patients, commercially insured patients, and the uninsured are excluded. In markets with high Medicare Advantage penetration, reported rates are calculated on a smaller, potentially non-representative subset of the hospital's total patient volume.
COVID-19 pandemic overlap. The 3-year measurement window (approximately July 2019 through June 2022 for FY2026) overlaps substantially with the COVID-19 public health emergency. CMS excluded certain discharge periods, but the remaining data may still reflect pandemic-era operational disruptions, staffing shortages, and changes in patient acuity and case mix.
Confidence interval width masks small-volume performance. The "Better/No Different/Worse than the National Rate" categories depend on whether a hospital's 95% confidence interval overlaps the national rate. Hospitals with small volumes produce wide confidence intervals and will almost always fall into "No Different" regardless of their actual rate, making the comparison category uninformative for low-volume facilities.
PSI-90 composite masking. The PSI-90 (Patient Safety and Adverse Events Composite) is an aggregate of multiple patient safety indicators. The composite score can mask deterioration in individual component measures — a hospital may show an acceptable composite while performing poorly on specific safety indicators.

Data Quality Notes

Rate and score fields stored as strings. The source CSV encodes risk-adjusted rates, denominators, and score fields as string values. Suppressed rows contain "Too Few to Report" or "Not Available" instead of numeric values. The ETL parses these with _try_float(), converting non-numeric entries to null in the JSON manifest. Rows where the cleaned value equals "Not Available" or "Not Applicable" are excluded entirely during loading.
Claims-derived complication measures. Complication and mortality measures are derived from administrative claims data, not clinical chart review. Variation in hospital coding practices — particularly in how thoroughly secondary diagnoses and present-on-admission indicators are documented — can affect reported rates independently of actual complication frequency. Hospitals with more thorough coding may paradoxically appear to have higher complication rates.
Column name variation across vintages. CMS changes column header casing and naming between file releases (e.g., "Facility ID" vs. "Facility Id", "Measure ID" vs. "Measure Name"). The ETL uses a candidate-list column matching strategy via _find_column() to handle these variations without manual updates.
Measure start and end date fields use MM/DD/YYYY string format in the source data. Missing value encoding is inconsistent across fields — the source uses "Not Available", "N/A", and empty strings interchangeably. The ETL normalizes all non-numeric/non-meaningful entries uniformly during the _clean() and _try_float() parsing steps.

← Back to Methodology Hub · Report an error