论文标题
有条件规避风险的上下文匪徒
Conditionally Risk-Averse Contextual Bandits
论文作者
论文摘要
具有平均案例统计保证的上下文匪徒在规避风险的情况下不足,因为它们可能会以退化的最差案例行为来取得更好的平均表现。设计一种规避风险的上下文强盗是具有挑战性的,因为探索是必要的,但是规避风险对奖励的整个分配很敏感。尽管如此,我们展示了第一个规避风险的上下文盗版算法,并具有在线遗憾的保证。我们从各种情况下进行实验,在各种情况下,应避免使用动态定价,库存管理和自我调整软件的最坏结果;包括生产Exascale数据处理系统。
Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system.