论文标题
何时鼓励使用高斯回归进行特征选择任务,并进行活动的结果
When to encourage using Gaussian regression for feature selection tasks with time-to-event outcome
论文作者
论文摘要
重要性:关于事件时间结果的特征选择是临床试验和生物标志物发现研究中的基本问题之一。但是目前尚不清楚当样本量较小或未测量某些关键协变量时应使用哪些统计方法。设计:在这项模拟研究中,真实模型是具有10个协变量的多元COX比例危害模型。假定只有5个为所有模型拟合的10个真实特征以及5个随机噪声特征。使用10,000个模拟数据集探索每个样本尺寸方案。将八个回归模型应用于每个数据集以估计特征效应,包括正则高斯回归(弹性净罚款)和正则化COX回归(GLMNET COX)。结果:如果协变量高度相关的高斯,则在数量转换的生存时间的高斯回归中,只有两个协变量的表现都优于所有测试的COX回归模型,当时事件的总数<500。
IMPORTANCE: Feature selection with respect to time-to-event outcomes is one of the fundamental problems in clinical trials and biomarker discovery studies. But it's unclear which statistical methods should be used when sample size is small or some of the key covariates are not measured. DESIGN: In this simulation study, the true models are multivariate Cox proportional hazards models with 10 covariates. It's assumed that only 5 out the 10 true features are observed/measured for all model fitting, along with 5 random noise features. Each sample size scenario is explored using 10,000 simulation datasets. Eight regression models are applied to each dataset to estimate feature effects, including both regularized Gaussian regression (elastic net penalty) and regularized Cox regression (glmnet Cox). RESULTS: If the covariates are highly correlated Gaussian, the Gaussian regression of log-transformed survival time with only two covariates outperforms all tested Cox regression models when total number of events <500.