论文标题
隐藏因素:二进制特征的功率和样本量计算中的协变量效应
The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
论文作者
论文摘要
准确的功率和样本量估计对于遗传关联研究的设计和分析至关重要。在通过逻辑回归分析二进制特征时,模型中通常包括重要的协变量,例如年龄和性别。但是,在研究计划期间,在功率或样本量计算中,它们的效果很少被正确考虑。与分析连续性状不同,二进制特征和遗传变异之间的关联性测试的力量也明确地取决于协方差效应,即使在基因环境独立性的假设下也是如此。较早的工作识别出这种隐藏的因素,但实现的方法并不灵活。因此,我们提出并实施了一种估算功率和样本量的广义方法(发现或复制)二进制特征的关联研究a)适用于不同类型的非基因协变量E,b)处理不同类型的G-E关系,C)计算有效。广泛的仿真研究表明,所提出的方法对于具有各种协方差结构的前瞻性和回顾性采样设计都是准确且计算上的。原则证明的申请重点是英国生物银行数据中的非洲样本。结果表明,与研究连续的血压特征相反,在分析二元高血压性状时,忽略了年龄和性别的协方差影响,导致了高估的功率和低估的复制样本量。
Accurate power and sample size estimation are crucial to the design and analysis of genetic association studies. When analyzing a binary trait via logistic regression, important covariates such as age and sex are typically included in the model. However, their effects are rarely properly considered in power or sample size computation during study planning. Unlike when analyzing a continuous trait, the power of association testing between a binary trait and a genetic variant depends, explicitly, on covariate effects, even under the assumption of gene-environment independence. Earlier work recognizes this hidden factor but implemented methods are not flexible. We thus propose and implement a generalized method for estimating power and sample size for (discovery or replication) association studies of binary traits that a) accommodates different types of non-genetic covariates E, b) deals with different types of G-E relationships, and c) is computationally efficient. Extensive simulation studies show that the proposed method is accurate and computationally efficient for both prospective and retrospective sampling designs with various covariate structures. A proof-of-principle application focused on the understudied African sample in the UK Biobank data. Results show that, in contrast to studying the continuous blood pressure trait, when analyzing the binary hypertension trait ignoring covariate effects of age and sex leads to overestimated power and underestimated replication sample size.