MSSP ACO Performance PY2024
Dataset ID: mssp-performance ·
← Back to Methodology Hub
Provenance
- Dataset ID
mssp-performance- Entity Type
- aco
- Role
- base
- Source
- CMS
- Vintage
- PY2024
- Entity Count
- 476
- Last ETL Run
- 2026-04-13
Overview
The MSSP ACO Performance PY2024 dataset is published by the Centers for Medicare & Medicaid Services (CMS) as a public-use file under the Medicare Shared Savings Program (MSSP). It contains one row per participating ACO for Performance Year 2024, with approximately 80 fields covering financial performance (generated and earned savings/losses, per-capita expenditures, benchmarks), quality scores, and beneficiary demographics. The 476 rows represent all ACOs that participated in MSSP during PY2024, including those that terminated mid-year. The source file is hosted on data.cms.gov and is downloaded during the CareGraph ETL pipeline.
This dataset answers questions such as: how much did an ACO save or lose relative to its CMS benchmark, what is the ACO's quality score, how many Medicare beneficiaries are assigned to it, and how does its per-capita spending compare to its benchmark. It is the sole data source for ACO entity pages on CareGraph.
Join Strategy
Each row joins to an ACO entity page using the ACO_ID field, a character string following the pattern A#### (e.g., A0001). During ETL, the normalize_aco_id() function in etl/normalize/keys.py strips whitespace, uppercases the value, and removes non-alphanumeric characters. The function validates that the result matches the expected letter-plus-digits pattern but does not reject non-conforming IDs. The join is one-to-one: each ACO ID maps to exactly one row in the source data and one JSON manifest written to site_data/aco/{ACO_ID}.json. Column names are detected at runtime using _find_column(), which tries exact match, case-insensitive match, then substring match against a candidate list — this handles variation in column headers across CMS file releases (e.g., ACO_ID vs. ACO_Num vs. ACO ID). Rows with blank or unparseable ACO IDs are skipped and counted in the build log.
Known Limitations
- Benchmark comparability across tracks. Savings and losses are calculated against a risk-adjusted, regionally-weighted benchmark that incorporates the ACO's own historical spending. ACOs that entered the program with high baseline spending have mechanically more favorable benchmarks — a known "rebasing" problem. Comparing the
Sav_ratefield across ACOs on different tracks (BASIC vs. ENHANCED) or at different points in their agreement periods is not apples-to-apples, because benchmark methodology differs by track and is revised at rebasing. - Beneficiary count reflects assignment, not service. The
N_ABfield counts beneficiaries assigned to the ACO, not all Medicare beneficiaries the ACO's providers treated. Assignment methodology (prospective vs. retrospective, voluntary alignment) varies by track and agreement period. Year-over-year changes inN_ABmay reflect CMS assignment rule changes rather than actual shifts in the ACO's patient population. - Quality score comparability. Quality scoring has transitioned from pay-for-reporting to pay-for-performance for many measures. ACOs in their first performance year may receive quality scores calculated differently from established ACOs. The
QualScorefield is not directly comparable across ACOs at different stages of program participation. - Cell-size suppression. ACOs with fewer than 11 beneficiaries in any demographic or clinical sub-category have those values suppressed per CMS cell-size disclosure rules. Suppressed fields appear as missing values in the source data.
- Partial-year and terminated ACOs. ACOs that terminated mid-year or entered mid-year have financial metrics (
GenSaveLoss,EarnSaveLoss,TotalExpnd, per-capita fields) that are annualized. These annualized figures may not be comparable to full-year ACOs, particularly for expenditure and savings calculations. - Medicare fee-for-service only. MSSP covers only Original Medicare (fee-for-service) beneficiaries. Medicare Advantage enrollees are excluded from all metrics. In markets with high MA penetration, MSSP data reflects a smaller, potentially non-representative share of an ACO's total patient panel.
Data Quality Notes
- Numeric fields encoded as strings. Financial fields such as
GenSaveLoss,EarnSaveLoss,UpdatedBnchmk,ABtotben, andTotalExpndare encoded in the source CSV with commas, dollar signs, and percentage symbols. The ETL's_try_float()function strips these characters before parsing; values that contain non-numeric markers (N/A,Not Available,.,*,-, or empty strings) are set to null in the JSON manifest. - Inconsistent missing-value encoding. The source data uses multiple sentinel values for missing data: empty strings,
N/A,Not Available,.,*, and-. All are normalized to null during ETL. There is no distinction in the output between "suppressed due to cell-size rules" and "not applicable" — both appear as null. - Column name variation across vintages. CMS has used different column header names and casing between file releases (e.g.,
ACO_IDvs.ACO_Num,ACO_Namevs.ACO_NAME). The ETL's_find_column()function handles this via candidate-list matching, but new header variations in future releases may require adding candidates. - File encoding. The source CSV is read with
encoding="utf-8", errors="replace", meaning any non-UTF-8 bytes in ACO names or other text fields are replaced with the Unicode replacement character (U+FFFD) rather than causing a parse failure.