CDC PLACES County-Level Data — Methodology

Provenance

Dataset ID: cdc-places
Entity Type: county
Role: enrichment
Source: CDC
Vintage: 2023 Release
Entity Count: 40
Last ETL Run: 2026-04-13

Overview

CDC PLACES (Population Level Analysis and Community Estimates) is a collaboration between the Centers for Disease Control and Prevention (CDC), the Robert Wood Johnson Foundation, and the CDC Foundation. The dataset provides county-level prevalence estimates for 36+ chronic disease, health behavior, prevention, disability, and health status measures. These estimates are not direct survey measurements — they are model-based small area estimates (SAE) produced by applying multilevel regression and poststratification (MRP) to state-level Behavioral Risk Factor Surveillance System (BRFSS) survey data. The current release (PLACES 2023, Socrata resource swc5-untb) uses BRFSS 2021 survey responses, creating a 2-year lag between data collection and publication.

PLACES answers questions such as: What is the estimated prevalence of diabetes, obesity, or smoking in a given county? How does chronic disease burden compare across counties after controlling for age distribution? Which counties have the highest rates of health risk behaviors or preventive care gaps? Measures span six categories — Health Outcomes, Health Risk Behaviors, Prevention, Disability, Health Status, and Health-Related Social Needs — and are reported as both crude and age-adjusted prevalence percentages with 95% confidence intervals.

Join Strategy

Each row in the source CSV carries a LocationID field containing the county FIPS code. During ETL, the _find_column() function matches this field from a candidate list (locationid, LocationID, CountyFIPS, FIPS) to handle column name variation across data vintages. The raw FIPS value is passed through normalize_fips(), which strips non-digit characters and left-pads to a 5-digit zero-padded string (2-digit state FIPS + 3-digit county FIPS). Rows where the geolevel field is present and not equal to county are filtered out; rows that do not yield a valid 5-digit FIPS after normalization are skipped.

The source data has one row per measure per county. The ETL groups rows by FIPS code, then pivots them into a dictionary keyed by measureid (e.g., DIABETES, OBESITY, CSMOKING) stored under data.places in the county JSON manifest. Each measure entry contains measure (human-readable name), category, value (prevalence percentage), value_type (crude or age-adjusted), low_ci, high_ci, and total_population. The join is a left join from the county entity manifest — county pages without a matching PLACES record display missing-data indicators rather than being omitted. County entity pages at /county/{FIPS} display a highlighted subset of measures (including DIABETES, OBESITY, CHD, COPD, BPHIGH, STROKE, CANCER, DEPRESSION) in a metric grid.

Known Limitations

Model-based estimates, not direct measurements. PLACES prevalence figures are MRP-modeled small area estimates derived from BRFSS survey data, not census counts or direct county-level surveys. The confidence intervals reflect model uncertainty, not sampling error alone. Counties with small populations have less BRFSS data to anchor the model, producing substantially wider confidence intervals — these estimates should be interpreted with caution.
Age adjustment uses the 2000 US standard population. Age-adjusted prevalence estimates standardize to the year 2000 population age distribution, enabling fair comparison across counties with different age profiles. However, this adjustment can mask the actual disease burden in counties with very old or very young populations, where crude prevalence may be more clinically meaningful.
Age-adjusted vs. crude prevalence discrepancies. CareGraph displays age-adjusted values by default. Local health department reports commonly use crude prevalence, which can produce noticeably different figures for the same county and measure. Users comparing CareGraph data to local reports should verify which prevalence type the comparison source uses.
2-year reporting lag. The PLACES 2023 data release is derived from BRFSS 2021 survey responses. Health behaviors and chronic disease prevalence may have shifted in the intervening period, particularly in the post-COVID-19 era when access to care, physical activity, and mental health indicators changed rapidly.
Connecticut FIPS reorganization. Connecticut replaced its 8 counties with 9 planning regions as county-equivalents in 2022. CDC PLACES may use legacy county FIPS codes that do not align with the Census Bureau's updated FIPS assignments, causing join failures for Connecticut counties unless legacy FIPS codes are mapped.
No explicit suppression, but confidence intervals vary. Unlike most CMS datasets, PLACES does not suppress individual county estimates for small cell sizes. All counties receive point estimates. However, the width of the 95% confidence interval serves as a de facto reliability indicator — estimates where the CI spans more than 10 percentage points are substantially less precise.

Data Quality Notes

Six missing-value sentinels normalized to null. The source CSV encodes missing values inconsistently as empty strings, N/A, Not Available, ., *, or -. The ETL _try_float() function converts all of these to null in the JSON manifest, providing uniform null representation for downstream consumers.
Column name variation across releases. CDC has changed column header casing and naming between PLACES data releases (e.g., data_value vs. DataValue vs. Data_Value, measureid vs. MeasureId). The ETL uses _find_column() with candidate lists of known variants and falls back to case-insensitive and substring matching, avoiding hard-coded column name dependencies.
Non-UTF-8 byte handling. The source CSV is loaded with encoding="utf-8", errors="replace", substituting Unicode replacement characters for any non-UTF-8 bytes rather than raising encoding errors. This prevents ETL failures but may introduce � characters in county or measure name strings in rare cases.
TotalPopulation is a county-level total, not a measure denominator. The total_population field in each measure entry reflects the county's total population estimate from the PLACES data, not the number of respondents or the denominator used to compute the prevalence percentage. It is the same value across all measures for a given county and should not be interpreted as a sample size.

← Back to Methodology Hub · Report an error