论文标题

部分可观测时空混沌系统的无模型预测

Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer

论文作者

Jin, Woojeong, Lee, Dong-Ho, Zhu, Chenguang, Pujara, Jay, Ren, Xiang

论文摘要

预先训练的语言模型在需要理解属性的任务(例如外观,可衡量的数量)和现实世界中日常对象的任务中仍然远非人类绩效,因为文本由于报告偏见而缺乏此类信息。在这项工作中,我们研究将视觉知识整合到语言模型中是否可以填补空白。我们研究了两种类型的知识转移:(1)使用图像标题传递的文本知识传输,这些图像标题可能包含丰富的视觉知识,(2)使用图像和字幕以及具有视觉语言训练目标的图像和字幕。在可能需要视觉知识来解决问题的5个下游任务中,我们对提出的目标进行了广泛的经验比较。我们的实验表明,视觉知识转移可以改善低资源和完全监督的设置的性能。

Pre-trained language models are still far from human performance in tasks that need understanding of properties (e.g. appearance, measurable quantity) and affordances of everyday objects in the real world since the text lacks such information due to reporting bias. In this work, we study whether integrating visual knowledge into a language model can fill the gap. We investigate two types of knowledge transfer: (1) text knowledge transfer using image captions that may contain enriched visual knowledge and (2) cross-modal knowledge transfer using both images and captions with vision-language training objectives. On 5 downstream tasks that may need visual knowledge to solve the problem, we perform extensive empirical comparisons over the presented objectives. Our experiments show that visual knowledge transfer can improve performance in both low-resource and fully supervised settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源