STRPM：高分辨率视频预测的时空残差预测模型

论文标题

STRPM：高分辨率视频预测的时空残差预测模型

STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction

论文作者

Chang, Zheng, Zhang, Xinfeng, Wang, Shanshe, Ma, Siwei, Gao, Wen

论文摘要

尽管许多视频预测方法在低分辨率（64 $ \ sim $ 128）视频中获得了良好的性能，但是尚未完全探索高分辨率（512 $ \ sim $ 4K）视频的预测模型（512 $ \ sim $ 4K），这是由于对高质量视频的需求不断增长，因此更有意义。与低分辨率视频相比，高分辨率视频包含更丰富的外观（空间）信息和更复杂的运动（时间）信息。在本文中，我们提出了一个时空残留预测模型（STRPM），以进行高分辨率视频预测。一方面，我们提出了一个时空编码编码方案，以保留更多的时空信息，以获取高分辨率视频。这样，可以保留每个框架的外观细节。另一方面，我们设计了一个残留的预测记忆（RPM），该记忆重点是建模上一个和将来的帧之间的时空残差特征（STRF），而不是整个框架，这可以极大地有助于在高分辨率视频中捕获复杂的运动信息。此外，提出的RPM可以监督空间编码器和时间编码器，分别在空间域和时间域中提取不同的特征。此外，提出的模型是使用具有学习感知损失（LP-loss）的生成对抗网络（GAN）培训的，以提高预测的感知质量。实验结果表明，与现有方法相比，STRPM可以产生更令人满意的结果。

Although many video prediction methods have obtained good performance in low-resolution (64$\sim$128) videos, predictive models for high-resolution (512$\sim$4K) videos have not been fully explored yet, which are more meaningful due to the increasing demand for high-quality videos. Compared with low-resolution videos, high-resolution videos contain richer appearance (spatial) information and more complex motion (temporal) information. In this paper, we propose a Spatiotemporal Residual Predictive Model (STRPM) for high-resolution video prediction. On the one hand, we propose a Spatiotemporal Encoding-Decoding Scheme to preserve more spatiotemporal information for high-resolution videos. In this way, the appearance details for each frame can be greatly preserved. On the other hand, we design a Residual Predictive Memory (RPM) which focuses on modeling the spatiotemporal residual features (STRF) between previous and future frames instead of the whole frame, which can greatly help capture the complex motion information in high-resolution videos. In addition, the proposed RPM can supervise the spatial encoder and temporal encoder to extract different features in the spatial domain and the temporal domain, respectively. Moreover, the proposed model is trained using generative adversarial networks (GANs) with a learned perceptual loss (LP-loss) to improve the perceptual quality of the predictions. Experimental results show that STRPM can generate more satisfactory results compared with various existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题