Nursing Home Provider Info
Dataset ID: nh-provider-info ·
← Back to Methodology Hub
Provenance
- Dataset ID
nh-provider-info- Entity Type
- snf
- Role
- base
- Source
- CMS
- Vintage
- Mar 2026
- Entity Count
- 14,703
- Last ETL Run
- 2026-04-13
Overview
The Nursing Home Provider Info dataset is published by the Centers for Medicare & Medicaid Services (CMS) through the Care Compare program (formerly Nursing Home Compare) and is available as a public-use file on data.cms.gov. It contains one row per Medicare- and Medicaid-certified skilled nursing facility (SNF) in the United States and its territories — approximately 14,703 facilities — with roughly 90 fields covering facility characteristics, certification status, the Five-Star Quality Rating System scores, penalty history, staffing metrics, and special designations such as Special Focus Facility (SFF) status and abuse icon flags. The dataset reflects the most recent quarterly update cycle from CMS.
This dataset answers questions such as: What is a facility's overall star rating and how do its component scores (health inspections, staffing, quality measures) compare? How many certified beds does a facility have versus its average daily census? Has a facility been cited for abuse or designated as an SFF? What is the facility's ownership type and participation in Medicare and Medicaid? It is the primary source for the facility-level profile information displayed on CareGraph SNF entity pages.
Join Strategy
Each record in this dataset is joined to a CareGraph SNF entity page using the Federal Provider Number field, which is the facility's CMS Certification Number (CCN). The CCN is a 6-character string, zero-padded on the left (e.g., 015001). During ETL, the join key is normalized by stripping leading and trailing whitespace and enforcing zero-padding to six digits. The join is a left join from the SNF entity manifest to this dataset: SNF pages without a matching record display missing-data indicators rather than being omitted. Source rows that do not match any entity page are logged as unmatched and excluded from the site build. The CCN format is validated during the ETL build step, and malformed keys are reported in the build log.
Known Limitations
- Star rating suppression. The overall, health inspection, staffing, and quality measure star ratings display "Not Available" for facilities with insufficient inspection history (fewer than three standard health surveys in the lookback window) or during the grace period following a change of ownership (CHOW). Users cannot distinguish between these two causes from the data alone.
- CHOW inspection history resets. When a facility undergoes a change of ownership, CMS regional offices decide whether to carry forward, partially carry forward, or reset the facility's health inspection history. Facilities acquired within the past three years may have star ratings based on an incomplete inspection record, reducing comparability with established facilities.
- SFF program scope. The Special Focus Facility designation flags the most persistently poor-performing nursing homes, but CMS caps the active SFF list at approximately 80 facilities nationally. Absence of an SFF flag does not indicate acceptable quality — it indicates only that the facility is not among the small number currently under heightened CMS scrutiny.
- Ownership opacity. The
ownership_typeand legal business name fields reflect the legal entity on the CMS certification, not necessarily the day-to-day operator. Many facilities operate under management agreements, and chain or private-equity affiliations are not identifiable from this dataset alone. - Bed count vs. census divergence. The
number_of_certified_bedsandaverage_number_of_residents_per_dayfields can diverge substantially. A facility with 120 certified beds but an average daily census of 60 may indicate financial distress, a specialized short-stay population with high turnover, or recent downsizing — the dataset does not distinguish among these causes. - Reporting lag. CMS updates this dataset quarterly, but the underlying health inspection, staffing, and quality measure data have their own reporting cycles. Health inspection scores reflect the most recent three years of standard and complaint surveys; staffing data derive from Payroll-Based Journal (PBJ) submissions for the most recent quarter; quality measures are derived from MDS assessments with their own reporting period. The dataset's publication date does not indicate a single uniform measurement period.
Data Quality Notes
- Binary flag encoding. The abuse icon and SFF status fields are binary indicators that use blank/empty values rather than an explicit "N" or "0" to indicate the absence of the flag. The ETL normalizes blanks to
falseand non-blank values (e.g., "Y", "1") totruein the JSON manifest; consumers should not treat null as "unknown" for these fields. - "Not Available" as a sentinel. Star rating fields use the string "Not Available" rather than a null or numeric value when a rating cannot be computed. The ETL maps "Not Available" to null in the JSON manifest. Other missing-value representations in the source CSV include empty strings and "N/A"; all are normalized to null during processing.
- Numeric fields encoded as strings. Several numeric fields in the source CSV (e.g., fine amounts, resident counts) contain commas, dollar signs, or other formatting characters. These are parsed to numeric types during ETL; values that fail parsing are set to null and logged.
- Date format inconsistencies. Date fields in the source data use mixed formats (MM/DD/YYYY and YYYY-MM-DD). The ETL standardizes all dates to ISO 8601 (YYYY-MM-DD) in the output manifests.