论文标题
可识别的图像压缩
Discernible Image Compression
论文作者
论文摘要
作为基本的低级图像处理任务之一,图像压缩对于计算机视觉非常重要。可以使用大量的视觉信息来保存巨大的计算和存储资源。传统的图像压缩方法倾向于通过使用相应的原始图像最大程度地降低其外观差异来获得压缩图像,但是在下游感知任务(例如图像识别和对象检测)中,它们在下游感知任务中几乎不关注它们的功效。因此,某些压缩图像可以通过偏见识别。相比之下,本文旨在通过追求外观和感知一致性来产生压缩图像。基于编码器框架,我们建议使用预训练的CNN提取原始图像和压缩图像的特征,并使其相似。因此,压缩图像在后续任务中可以识别,我们将方法称为可识别的图像压缩(DIC)。另外,采用最大平均差异(MMD)来最大程度地减少特征分布之间的差异。所得的压缩网络可以生成具有高图像质量的图像,并保留特征域中的一致感知,从而可以通过预先训练的机器学习模型可以很好地识别这些图像。基准上的实验表明,通过随后的视觉识别和检测模型,也可以很好地识别通过建议的方法压缩的图像。例如,DIC压缩图像的MAP值比使用常规方法使用压缩图像的图像高约0.6%。
Image compression, as one of the fundamental low-level image processing tasks, is very essential for computer vision. Tremendous computing and storage resources can be preserved with a trivial amount of visual information. Conventional image compression methods tend to obtain compressed images by minimizing their appearance discrepancy with the corresponding original images, but pay little attention to their efficacy in downstream perception tasks, e.g., image recognition and object detection. Thus, some of compressed images could be recognized with bias. In contrast, this paper aims to produce compressed images by pursuing both appearance and perceptual consistency. Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images, and making them similar. Thus the compressed images are discernible to subsequent tasks, and we name our method as Discernible Image Compression (DIC). In addition, the maximum mean discrepancy (MMD) is employed to minimize the difference between feature distributions. The resulting compression network can generate images with high image quality and preserve the consistent perception in the feature domain, so that these images can be well recognized by pre-trained machine learning models. Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models. For instance, the mAP value of compressed images by DIC is about 0.6% higher than that of using compressed images by conventional methods.