有条件规避风险的上下文匪徒

论文标题

有条件规避风险的上下文匪徒

Conditionally Risk-Averse Contextual Bandits

论文作者

Farsang, Mónika, Mineiro, Paul, Zhang, Wangda

论文摘要

具有平均案例统计保证的上下文匪徒在规避风险的情况下不足，因为它们可能会以退化的最差案例行为来取得更好的平均表现。设计一种规避风险的上下文强盗是具有挑战性的，因为探索是必要的，但是规避风险对奖励的整个分配很敏感。尽管如此，我们展示了第一个规避风险的上下文盗版算法，并具有在线遗憾的保证。我们从各种情况下进行实验，在各种情况下，应避免使用动态定价，库存管理和自我调整软件的最坏结果；包括生产Exascale数据处理系统。

Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题