论文标题

通过有限种群抽样中的随机森林进行模型辅助估计

Model-assisted estimation through random forests in finite population sampling

论文作者

Dagdoug, Mehdi, Goga, Camelia, Haziza, David

论文摘要

在调查中,兴趣在于估计有限的人口参数,例如人口总数和手段。在大多数调查中,在估计阶段都可以找到一些辅助信息。该信息可以纳入估算程序以提高其精度。在本文中,我们使用随机森林来估计调查变量与辅助变量之间的功能关系。近年来,随着国家统计局现在已经获得了各种数据源,随机森林变得有吸引力,可能会对大量变量表现出大量观察。我们建立基于随机森林的模型辅助程序的理论特性,并得出相应的方差估计器。还讨论了用于处理多个调查变量的模型校准程序。一项仿真研究的结果表明,在各种环境中,提出的点和估计程序在偏见,效率和基于正常置信区间的覆盖方面表现良好。最后,我们使用法国观众公司Médiamétrie收集的无线电受众的数据应用了建议的方法。

In surveys, the interest lies in estimating finite population parameters such as population totals and means. In most surveys, some auxiliary information is available at the estimation stage. This information may be incorporated in the estimation procedures to increase their precision. In this article, we use random forests to estimate the functional relationship between the survey variable and the auxiliary variables. In recent years, random forests have become attractive as National Statistical Offices have now access to a variety of data sources, potentially exhibiting a large number of observations on a large number of variables. We establish the theoretical properties of model-assisted procedures based on random forests and derive corresponding variance estimators. A model-calibration procedure for handling multiple survey variables is also discussed. The results of a simulation study suggest that the proposed point and estimation procedures perform well in term of bias, efficiency, and coverage of normal-based confidence intervals, in a wide variety of settings. Finally, we apply the proposed methods using data on radio audiences collected by Médiamétrie, a French audience company.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源