大量观察数据的因果推断的分布式设计

论文标题

大量观察数据的因果推断的分布式设计

Distributed Design for Causal Inferences on Big Observational Data

论文作者

Zhang, Yumin, Sabbaghi, Arman

论文摘要

由于治疗组之间的协变量失衡，有关大量观察数据的因果推断的基本问题正在混淆。这可以通过在分析之前设计数据来解决。现有的设计方法是针对单个设计师的传统观察性研究而开发的，可以产生不满意的设计，并由于无法适应大数据的巨大维度，异质性和数量而具有次优的协变量平衡。我们为合作设计师之间的大观测数据分布式设计提出了一个新的框架。我们的框架首先将高维和异质协变量的子集分配给多个设计师。然后，设计师将协变量总结为较低维的数量，与其他人分享其摘要，并根据分配的协变量和收到的摘要并行设计研究。通过比较候选人所有协变量的平衡度量来选择最终设计。我们进行仿真研究并分析2016年大西洋因果推理会议数据挑战的数据集，以证明我们从大观测数据中构建具有良好协变量平衡的设计框架的灵活性和功能。

A fundamental issue in causal inference for Big Observational Data is confounding due to covariate imbalances between treatment groups. This can be addressed by designing the data prior to analysis. Existing design methods, developed for traditional observational studies with single designers, can yield unsatisfactory designs with suboptimum covariate balance for Big Observational Data due to their inability to accommodate the massive dimensionality, heterogeneity, and volume of the Big Data. We propose a new framework for the distributed design of Big Observational Data amongst collaborative designers. Our framework first assigns subsets of the high-dimensional and heterogeneous covariates to multiple designers. The designers then summarize their covariates into lower-dimensional quantities, share their summaries with the others, and design the study in parallel based on their assigned covariates and the summaries they receive. The final design is selected by comparing balance measures for all covariates across the candidates. We perform simulation studies and analyze datasets from the 2016 Atlantic Causal Inference Conference Data Challenge to demonstrate the flexibility and power of our framework for constructing designs with good covariate balance from Big Observational Data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题