关于在线学习用于拥塞级别预测流量数据的新观点

论文标题

关于在线学习用于拥塞级别预测流量数据的新观点

New Perspectives on the Use of Online Learning for Congestion Level Prediction over Traffic Data

论文作者

Manibardo, Eric L., Laña, Ibai, Lobo, Jesus L., Del Ser, Javier

论文摘要

这项工作着重于分类时间序列数据。当非平稳现象产生时间序列时，将系列与班级相关的模式可能会随着时间的推移而发展（概念漂移）。因此，旨在了解这种模式的预测模型最终可能会过时，因此无法维持实际使用的绩效水平。为了克服该模型退化，在线学习方法会从随着时间的推移到达的新数据样本中逐步学习，并通过实施各种概念漂移策略来适应数据流的最终更改。在本手稿中，我们详细阐述了在线学习方法的适用性，以根据交通速度时间序列数据来预测道路拥塞水平。当预测范围增加时，我们就性能降解进行了有趣的见解。与大多数文献中所做的事情相反，我们提供了在设计和调整学习模型之前评估班级分布的重要性的证据。先前的练习可能会暗示目标下不同的拥塞水平的可预测性。通过在西雅图（美国）部署的归纳循环捕获的真实交通速度数据讨论实验结果。从传统的增量学习算法到更详细的深度学习模型，分析了几种在线学习方法。如报告的结果所示，当增加预测范围时，所有模型的性能会严重降级，这是由于沿时间的分布分布，这支持了我们关于在模型设计之前分析此分布的重要性的主张。

This work focuses on classification over time series data. When a time series is generated by non-stationary phenomena, the pattern relating the series with the class to be predicted may evolve over time (concept drift). Consequently, predictive models aimed to learn this pattern may become eventually obsolete, hence failing to sustain performance levels of practical use. To overcome this model degradation, online learning methods incrementally learn from new data samples arriving over time, and accommodate eventual changes along the data stream by implementing assorted concept drift strategies. In this manuscript we elaborate on the suitability of online learning methods to predict the road congestion level based on traffic speed time series data. We draw interesting insights on the performance degradation when the forecasting horizon is increased. As opposed to what is done in most literature, we provide evidence of the importance of assessing the distribution of classes over time before designing and tuning the learning model. This previous exercise may give a hint of the predictability of the different congestion levels under target. Experimental results are discussed over real traffic speed data captured by inductive loops deployed over Seattle (USA). Several online learning methods are analyzed, from traditional incremental learning algorithms to more elaborated deep learning models. As shown by the reported results, when increasing the prediction horizon, the performance of all models degrade severely due to the distribution of classes along time, which supports our claim about the importance of analyzing this distribution prior to the design of the model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题