Case Study 02 · Medical Data Mining
NHANES 2017-2018 复杂抽样加权回归建模案例
本案例展示了如何从 NHANES 数据库中清洗、提取并关联日常运动时间(ActiveMinutes)与受试者糖化血红蛋白(HbA1c)水平。我们进行了多阶段的复杂抽样加权,以实现全美人口的无偏代表性估计。
R 统计分析代码片段 (Reproducible R Code)
# Load survey package for complex sampling weight adjustment
library(survey)
library(tidyverse)
# Define NHANES survey design
nhanes_design <- svydesign(
id = ~SDMVPSU, # Primary Sampling Unit (PSU)
strata = ~SDMVSTRA, # Stratification variable
weights = ~WTMEC2YR, # Mobile Examination Center (MEC) weights
nest = TRUE,
data = nhanes_dataset_clean
)
# Run multivariate weighted survey regression model
fit <- svyglm(
formula = LBXGH ~ ActiveMinutes + RIDAGEYR + RIAGENDR + BMDAVXIN,
design = nhanes_design
)
# Summary statistics
summary(fit)
library(survey)
library(tidyverse)
# Define NHANES survey design
nhanes_design <- svydesign(
id = ~SDMVPSU, # Primary Sampling Unit (PSU)
strata = ~SDMVSTRA, # Stratification variable
weights = ~WTMEC2YR, # Mobile Examination Center (MEC) weights
nest = TRUE,
data = nhanes_dataset_clean
)
# Run multivariate weighted survey regression model
fit <- svyglm(
formula = LBXGH ~ ActiveMinutes + RIDAGEYR + RIAGENDR + BMDAVXIN,
design = nhanes_design
)
# Summary statistics
summary(fit)
多因素加权回归估计结果 (Model Estimates)
| 变量名 (Predictor) | 回归系数 (Beta Estimate) | 标准误 (Std. Error) | t 值 (t-value) | P 值 (p-value) |
|---|---|---|---|---|
| Intercept (截距) | 5.7203 | 0.1245 | 45.945 | < 2e-16 *** |
| ActiveMinutes (运动分钟数) | -0.0031 | 0.0009 | -3.152 | 0.0036 ** |
| RIDAGEYR (年龄) | 0.0125 | 0.0021 | 5.961 | 1.4e-06 *** |
| RIAGENDR (性别: 女性) | -0.0831 | 0.0410 | -2.026 | 0.0513 . |
| BMDAVXIN (BMI 身体指数) | 0.0452 | 0.0051 | 8.846 | 4.8e-09 *** |
模型显著性解释:在完全校正了年龄、性别与 BMI 协变量的影响后,中高强度日常运动每增加 10 分钟,受试者糖化血红蛋白(HBA1c)水平显著降低 0.031% ($p = 0.0036$),支持了运动对长期血糖控制的主效应。
数据结构预览 (gee_analytic_dataset.xlsx)
清洗整合后的分析型宽表,可直接导入 R 语言进行各种统计检验:
| SEQN (序列号) | WTMEC2YR (权重) | SDMVPSU (PSU) | SDMVSTRA (分层) | LBXGH (HbA1c) | ActiveMinutes | RIDAGEYR | BMDAVXIN |
|---|---|---|---|---|---|---|---|
| 93703 | 10255.42 | 2 | 145 | 5.4 | 45 | 32 | 24.3 |
| 93704 | 85433.91 | 1 | 146 | 6.8 | 15 | 56 | 28.9 |
| 93705 | 24491.08 | 2 | 145 | 5.1 | 60 | 22 | 21.1 |
| 93706 | 41203.20 | 1 | 148 | 7.2 | 0 | 64 | 31.4 |