Patient Survey (HCAHPS) — Hospital
Dataset ID: hosp-hcahps ·
← Back to Methodology Hub
Provenance
- Dataset ID
hosp-hcahps- Entity Type
- hospital
- Role
- enrichment
- Source
- CMS
- Vintage
- FY2026
- Entity Count
- 5,399
- Last ETL Run
- 2026-04-13
Overview
The Patient Survey (HCAHPS) dataset is published by the Centers for Medicare & Medicaid Services (CMS) as part of the Hospital Compare program, now integrated into the Care Compare initiative. HCAHPS — the Hospital Consumer Assessment of Healthcare Providers and Systems — is a standardized survey instrument developed by CMS and the Agency for Healthcare Research and Quality (AHRQ) to measure patients' perspectives on their hospital experience. The survey is administered to a random sample of adult inpatients between 48 hours and 6 weeks after discharge from Medicare-certified acute care hospitals across the United States and its territories. The current file covers FY2026, with survey collection periods typically spanning 12 months and a reporting lag of approximately 9–12 months.
The dataset contains approximately 60 fields spanning multiple survey dimensions: communication with nurses, communication with doctors, responsiveness of hospital staff, communication about medicines, cleanliness and quietness of the hospital environment, discharge information, care transition, and overall hospital rating. Each dimension reports top-box, middle-box, and bottom-box percentage scores, along with HCAHPS star ratings and the number of completed surveys. The top-box methodology counts only the most favorable responses — "Always" on 4-point frequency scales and "9" or "10" on the 0–10 overall rating scale. Each row represents one survey measure for one hospital. This dataset answers questions about how patients perceive their hospital stay, which dimensions of experience a hospital excels or lags in, and how a hospital's patient experience compares to the national distribution. It measures patient perception, not clinical quality — the two can diverge significantly.
Join Strategy
CareGraph joins this dataset to hospital entity pages using the Facility ID field, which corresponds to the CMS Certification Number (CCN). The CCN is a 6-character zero-padded string (e.g., 010001). During ETL, the _find_column() function matches the CCN column against a candidate list (Facility ID, Hospital CCN, Provider Number, Facility Id, Provider ID, CCN) to handle header variation across CMS file releases. The normalize_ccn() function strips whitespace and zero-pads values shorter than 6 characters. Because the dataset contains multiple rows per hospital (one per survey measure), the join produces a one-to-many relationship between the hospital entity and its measure-level records. Matched rows are grouped by CCN via _load_measures_by_ccn() and written to the hospital's JSON manifest under data.hcahps. Non-numeric sentinel values such as "Not Available" and "Not Applicable" are discarded during loading; numeric fields are parsed with _try_float(), which converts non-numeric values to null. A provenance record with dataset ID hosp-hcahps is appended to the manifest. Hospitals without matching HCAHPS rows display missing data indicators rather than being excluded from CareGraph.
Known Limitations
- 100-survey suppression threshold. Hospitals with fewer than 100 completed HCAHPS surveys are suppressed from public reporting. This disproportionately affects small-volume, rural, and specialty hospitals, creating systematic selection bias — published results skew toward larger, higher-volume facilities.
- Non-response bias. National survey response rates have declined steadily to approximately 25%, and individual hospital response rates range from 15% to 45%. CMS's mode-and-patient-mix adjustment does not fully correct for non-response bias, meaning hospitals with lower response rates may have scores that are not representative of their full patient population.
- Patient-mix adjustment residual bias. Results are adjusted for patient mix using CMS's standardized adjustment model, which accounts for age, education, self-rated health, primary language, and service line. However, hospitals serving predominantly sicker or lower-health-literacy populations may still show systematically lower scores after adjustment. The adjustment does not account for hospital structural characteristics such as size, teaching status, or safety-net designation.
- Top-box scoring compresses variation. Only the most favorable responses count toward top-box scores — "Always" on frequency items and "9" or "10" on the overall rating. The difference between "Usually" and "Always" is treated identically to the difference between "Never" and "Always," which may overstate quality gaps between hospitals whose patients report consistently good but not perfect experiences.
- Star ratings use relative clustering, not fixed thresholds. HCAHPS star ratings are derived from survey scores using a k-means clustering algorithm applied to the national distribution. A hospital's star rating can change even if its raw scores remain constant, if the national distribution shifts. Star ratings should not be interpreted as absolute performance benchmarks.
- Medicare fee-for-service population only. The HCAHPS survey is administered to Medicare FFS inpatients. Medicare Advantage enrollees, Medicaid-only patients, commercially insured patients, and the uninsured are not systematically included, which can bias results for hospitals whose patient mix is heavily weighted toward non-FFS populations.
Data Quality Notes
- Score fields stored as strings with sentinel values. The source CSV encodes top-box, middle-box, and bottom-box percentage fields, survey response rates, and star ratings as strings. Suppressed or inapplicable values appear as "Not Available" or "Not Applicable" rather than numeric placeholders. The ETL discards these sentinels and converts parseable values to numeric types via
_try_float()— a null value in the JSON manifest indicates suppression or ineligibility, not a zero score. - Column name variation across vintages. CMS has changed header names and casing between file releases. The ETL resolves the CCN column via
_find_column()with a candidate list, and the measure identifier column is matched against candidates includingHCAHPS Measure ID,Measure ID, andMeasure Name. New header names in future releases may require adding candidates to these lists. - File encoding. The source CSV is read with
encoding="utf-8", errors="replace", so any non-UTF-8 bytes in hospital names or free-text fields are replaced with the Unicode replacement character (U+FFFD) rather than causing a parse failure. This can produce visible artifacts (e.g.,â€"replacing an em dash) in hospital name fields. - Mixed numeric and categorical fields per row. Each row contains a mix of numeric fields (percentages, counts, star ratings) and categorical fields (measure IDs, measure descriptions, footnote codes). The ETL attempts numeric conversion on all non-measure-ID fields and falls back to string storage, so the data type of a given field in the JSON manifest depends on whether the source value was parseable as a number.