Medicare Geographic Variation by County — Methodology

Provenance

Dataset ID: geo-var-county
Entity Type: county
Role: base
Source: CMS
Vintage: 2014–2023
Entity Count: 3,198
Last ETL Run: 2026-04-13

Overview

The Medicare Geographic Variation by County dataset is published by the Centers for Medicare & Medicaid Services (CMS) through the Medicare Geographic Variation Public Use File program. It contains county-level spending, utilization, and demographic measures for Medicare fee-for-service (FFS) beneficiaries, with approximately 100 fields across 3,198 county-level records. The source file spans calendar years 2014 through 2023; CareGraph displays the most recent year available (currently 2023). The dataset is released annually, typically with a 12- to 18-month reporting lag from the end of the measurement year.

This dataset answers questions such as: How does per-capita Medicare spending in one county compare to the national average? Which counties have the highest hospitalization or emergency department visit rates? How does beneficiary demographics (age, sex, race, dual-eligible status) and illness burden (HCC risk score) vary across counties? CareGraph uses the standardized per-capita spending figures (fields ending in _STDZD_PYMT_PC) to enable fair cross-county comparisons by removing geographic payment adjustments such as the wage index, cost-of-living adjustments, and teaching hospital add-on payments.

Join Strategy

Each row in the source CSV carries a BENE_GEO_CD field containing the county FIPS code. During ETL, only rows where BENE_GEO_LVL equals County and BENE_AGE_LVL equals All are retained, yielding one row per county for the selected year. The BENE_GEO_CD value is normalized to a 5-digit zero-padded string (2-digit state FIPS + 3-digit county FIPS) by stripping non-digit characters and left-padding with zeros via normalize_fips(). This normalized FIPS code serves as the join key to the county entity page at /county/{FIPS}.

The join is a left join from the county entity manifest to the dataset: county pages without a matching geographic variation record display missing-data indicators rather than being omitted. Rows whose BENE_GEO_CD does not yield a valid 5-digit FIPS after normalization are logged and skipped. The county display name and state abbreviation are parsed from the BENE_GEO_DESC field, which uses the format ST-County Name.

Known Limitations

FFS-only population. The dataset covers Medicare fee-for-service beneficiaries exclusively. Counties with high Medicare Advantage penetration (e.g., parts of Florida, Southern California, Puerto Rico) have a smaller and potentially non-representative FFS population, making per-capita figures less generalizable to the full Medicare population. The MA_PRTCPTN_RATE field reports each county's Medicare Advantage share for context.
Cell-size suppression. Counties with fewer than 11 beneficiaries in a given spending or utilization category have values suppressed by CMS to protect beneficiary privacy. Suppressed values appear as * in the source CSV. This disproportionately affects small rural counties and service categories with inherently low utilization (e.g., hospice, SNF). Approximately 5% of county-metric combinations are suppressed.
Residence-based attribution. County-level aggregation uses the beneficiary's county of residence, not the county where services were delivered. Border counties and counties adjacent to major medical centers may show spending and utilization patterns that reflect cross-county care-seeking rather than local service delivery.
Risk-score confounding. The BENE_AVG_RISK_SCRE field (HCC risk score) reflects the average illness burden of beneficiaries in the county. Counties with higher risk scores are expected to have higher spending. Comparing raw spending across counties without accounting for risk scores can be misleading; CareGraph displays the risk score alongside spending metrics to support informed comparison.
Reporting lag. The most recent data release (published March 2025) contains measurement-year 2023 data. There is typically a 12- to 18-month lag between the end of a measurement year and the data release date.

Data Quality Notes

Suppressed values converted to null. The source CSV encodes suppressed values as *, ., empty strings, N/A, or Not Available. The ETL _try_float() function converts all of these to null in the JSON manifest, so downstream consumers see a uniform null rather than mixed sentinel strings.
Numeric fields stored as strings in source. Several numeric fields in the source CSV contain commas (e.g., 1,234.56) or are formatted as plain strings. The ETL strips commas and parses to float; values that fail parsing are set to null and do not surface an error to the entity page.
Single-year slice from multi-year file. The source CSV contains data for calendar years 2014 through 2023. The ETL selects only the latest year (YEAR field) for the county manifest. Historical year data is present in the raw download but is not currently exposed on entity pages.
Field names preserved as-is. Metric field names in the JSON manifest retain their original CMS uppercase naming convention (e.g., TOT_MDCR_STDZD_PYMT_PC, BENE_AVG_RISK_SCRE). The raw section of each manifest stores all original CSV columns with their original names and string values for full-fidelity access.

← Back to Methodology Hub · Report an error