论文标题
在未观察到的混杂下进行连续决策的政策政策评估
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
论文作者
论文摘要
当观察到的决策仅取决于观察到的功能时,用于顺序决策问题的政策策略评估(OPE)方法可以在部署之前估算评估策略的性能。由于未观察到的混杂因素,未记录的变量,这些假设经常被违反,这些变量既影响决策及其结果。我们通过在评估策略的性能方面开发最坏的案例界限来评估OPE方法的鲁棒性。当未观察到的混杂因素会影响情节中的每个决定时,我们证明即使是少量的每次混淆也会严重偏向OPE方法。幸运的是,在医疗保健,政策制定,运营和技术中发现的许多重要环境中,未观察到的混杂因素可能主要影响做出的许多决定之一。在这种不太悲观的一项决策混杂模型下,我们提出了一个有效的基于基于损失的程序来计算最坏情况的界限,并证明了其统计一致性。在两个模拟的医疗保健示例中 - 败血症患者的管理和自闭症儿童的发育干预措施---这是一个合理的混淆模型,我们证明我们的方法无效不舒适的结果,并提供有意义的鲁棒性证书,即使在不观察到的混淆下也可以可靠地选择策略。
When observed decisions depend only on observed features, off-policy policy evaluation (OPE) methods for sequential decision making problems can estimate the performance of evaluation policies before deploying them. This assumption is frequently violated due to unobserved confounders, unrecorded variables that impact both the decisions and their outcomes. We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy. When unobserved confounders can affect every decision in an episode, we demonstrate that even small amounts of per-decision confounding can heavily bias OPE methods. Fortunately, in a number of important settings found in healthcare, policy-making, operations, and technology, unobserved confounders may primarily affect only one of the many decisions made. Under this less pessimistic model of one-decision confounding, we propose an efficient loss-minimization-based procedure for computing worst-case bounds, and prove its statistical consistency. On two simulated healthcare examples---management of sepsis patients and developmental interventions for autistic children---where this is a reasonable model of confounding, we demonstrate that our method invalidates non-robust results and provides meaningful certificates of robustness, allowing reliable selection of policies even under unobserved confounding.