Medicare Part D Spending by Drug — Methodology

Provenance

Dataset ID: partd-drug-spending
Entity Type: drug
Role: base
Source: CMS
Vintage: CY2023
Entity Count: 1,938
Last ETL Run: 2026-04-13

Overview

The Medicare Part D Spending by Drug dataset is published by CMS on data.cms.gov as part of the Medicare Drug Spending dashboard. It reports total spending, claim counts, beneficiary counts, cost-per-dosage-unit, and year-over-year spending changes for drugs covered under Medicare Part D — the outpatient and retail prescription drug benefit. Each row represents one drug, identified by brand name and generic name. The current file contains 1,938 drug records. The dataset does not include drugs administered by physicians and billed under Medicare Part B (e.g., infused chemotherapy, injectable biologics).

This dataset answers questions such as: which drugs account for the largest share of Medicare Part D spending, how average cost per dosage unit compares across drugs, how many beneficiaries use a given drug, and how spending has changed year over year. Total spending reflects gross amounts before manufacturer rebates, which are confidential — net spending after rebates is substantially lower for many brand-name drugs.

Join Strategy

This dataset joins to drug entity pages on CareGraph using the generic name field (Generic Name), uppercased and trimmed during ETL to serve as the canonical drug identifier. All brand-name formulations, manufacturers, and dosage forms that share the same generic name are aggregated under a single drug entity page. For example, all records for atorvastatin — regardless of brand (Lipitor, generics) or dosage form (10 mg, 20 mg, 40 mg tablets) — are merged into the /drug/atorvastatin page. The join key is a plain string match after uppercasing and whitespace normalization; no code-based identifier (NDC, RxNorm) is used. Drug entity pages display the aggregated spending, claim count, beneficiary count, and cost-per-unit fields from this dataset.

Known Limitations

Small-cell suppression. Drugs with fewer than 11 claims or fewer than 11 beneficiaries in the reporting year are excluded entirely from the file to protect patient privacy. This disproportionately removes rare disease drugs, newly launched drugs, and drugs nearing market exit — categories where patients and researchers may have the greatest unmet information need.
Pre-rebate spending only. The Total Spending field includes plan-paid amounts, beneficiary cost-sharing, and low-income subsidy payments, but reflects gross spending before confidential manufacturer rebates. For rebate-heavy brand-name drugs, reported spending can overstate true net cost by 30–60% or more. Cross-drug spending comparisons are unreliable without rebate adjustment.
Generic-name aggregation obscures price variation. CareGraph merges all brand and generic formulations under a single generic name. This masks potentially large price differences between branded and generic versions of the same molecule, between different dosage forms (tablet vs. injectable), and between different manufacturers.
Part D only — excludes Part B and Medicare Advantage drug coverage. Physician-administered drugs billed under Part B (e.g., infused oncology drugs, certain injectables) are not included. Drugs dispensed through Medicare Advantage Part D plans are included, but any supplemental drug benefits unique to MA plans are not separately identified.
Spending changes are not decomposed. Year-over-year spending changes can reflect unit price increases, utilization shifts, generic entry, formulary changes, or benefit structure changes (e.g., the Inflation Reduction Act drug price negotiation program beginning in 2026). The dataset does not attribute spending changes to any single driver.
Low-income subsidy beneficiary patterns are not visible. Beneficiaries receiving the Part D low-income subsidy have substantially different cost-sharing structures. Their spending patterns are folded into the aggregate and cannot be separated, which may skew average cost-per-unit figures.

Data Quality Notes

Generic-name string matching is fragile. The join key is an uppercased generic name string, not a coded identifier like NDC or RxCUI. Minor spelling variations, salt-form differences (e.g., "METFORMIN HCL" vs. "METFORMIN HYDROCHLORIDE"), or combination drug naming conventions can cause the same molecule to appear as separate entities or fail to match across datasets.
Cost-per-unit denominators vary by dosage form. The average cost per dosage unit field divides total spending by total dosage units, but "dosage unit" is not standardized across drug forms — it may represent one tablet, one milliliter, one patch, or one vial. Comparing cost-per-unit across drugs with different dosage forms is not meaningful without normalization to a common unit (e.g., defined daily dose).
Beneficiary counts are not unique across drugs. A single beneficiary using multiple drugs is counted once per drug. Summing beneficiary counts across drugs overstates the number of distinct Part D enrollees. Within a single drug record, the beneficiary count reflects distinct individuals.
Numeric fields may encode suppressed values as strings. Some spending and count fields in the source CSV contain text markers (e.g., empty strings or suppression indicators) instead of numeric values for suppressed rows. The ETL coerces these to null during processing.

← Back to Methodology Hub · Report an error