论文标题

同时推断大量数据:分布式引导程序

Simultaneous Inference for Massive Data: Distributed Bootstrap

论文作者

Yu, Yang, Chao, Shih-Kang, Cheng, Guang

论文摘要

在本文中,我们提出了一种用于大量机器中分布的大量数据处理的自举方法。这种新方法在计算上是有效的,因为我们在主机上启动了无需过度绘制,通常由现有方法\ cite {kleiner2014Scalable,sengupta2016subspsmpled}所要求,而可证明具有最小通信的最佳统计效率。我们的方法不需要重复重新安装模型,而仅在工具机上收到的梯度上应用主机中的乘数引导程序。模拟验证我们的理论。

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods \cite{kleiner2014scalable,sengupta2016subsampled}, while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源