通过基于在线内核的强化学习，在线性自适应过滤中p-norm的动态选择

论文标题

通过基于在线内核的强化学习，在线性自适应过滤中p-norm的动态选择

Dynamic selection of p-norm in linear adaptive filtering via online kernel-based reinforcement learning

论文作者

Vu, Minh, Akiyama, Yuki, Slavakis, Konstantinos

论文摘要

这项研究解决了在每个时间实例中动态选择的问题，即在线性自适应滤波中对抗异常值的``最佳''p-norm，而无需了解异常值的潜在时变概率分布函数。为此，通过基于内核的强化学习（KBRL）设计了一个在线和数据驱动的框架。介绍了有关复制内核希尔伯特空间（RKHSS）的新型贝尔曼映射，该映射不需要关于马尔可夫决策过程的过渡概率的知识，并且在基本的希尔伯特式规范方面无用。最终，通过引入拟议中的贝尔曼映射的定点集的有限维仿期超集，最终提供了一个近似的政策题框架。 RKHSS中众所周知的``维度诅咒''是通过通过近似线性依赖性标准构建向量的基础来解决的。关于合成数据的数值测试表明，所提出的框架始终为手头的异常值选择“最佳” p-norm，同时优于几个非RL和KBRL方案。

This study addresses the problem of selecting dynamically, at each time instance, the ``optimal'' p-norm to combat outliers in linear adaptive filtering without any knowledge on the potentially time-varying probability distribution function of the outliers. To this end, an online and data-driven framework is designed via kernel-based reinforcement learning (KBRL). Novel Bellman mappings on reproducing kernel Hilbert spaces (RKHSs) are introduced that need no knowledge on transition probabilities of Markov decision processes, and are nonexpansive with respect to the underlying Hilbertian norm. An approximate policy-iteration framework is finally offered via the introduction of a finite-dimensional affine superset of the fixed-point set of the proposed Bellman mappings. The well-known ``curse of dimensionality'' in RKHSs is addressed by building a basis of vectors via an approximate linear dependency criterion. Numerical tests on synthetic data demonstrate that the proposed framework selects always the ``optimal'' p-norm for the outlier scenario at hand, outperforming at the same time several non-RL and KBRL schemes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题