论文标题
沟通崩溃:关于人与神经字幕之间的低相互可理解性
Communication breakdown: On the low mutual intelligibility between human and neural captioning
论文作者
论文摘要
我们比较了基于神经字幕的图像检索器的0射击性能,当给出作为输入的人物产生的字幕或神经标题产生的字幕。我们对最近引入的Imagecode数据集(Krojer等,2022)进行了比较,该数据集包含与要检索的图像几乎相同的硬干扰物。我们发现,当喂神经而不是人类字幕时,神经捕捞者的性能要高得多,尽管与后者不同的事实是在没有意识到使任务困难的干扰物的情况下产生的。更值得注意的是,当对人类受试者的神经字幕赋予相同的标题时,它们的检索表现几乎是偶然的。因此,我们的结果增加了越来越多的证据,即即使神经模型的``语言''类似于英语,这种表面相似之处也可能深深误导。
We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced ImageCoDe data-set (Krojer et al., 2022) which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher performance when fed neural rather than human captions, despite the fact that the former, unlike the latter, were generated without awareness of the distractors that make the task hard. Even more remarkably, when the same neural captions are given to human subjects, their retrieval performance is almost at chance level. Our results thus add to the growing body of evidence that, even when the ``language'' of neural models resembles English, this superficial resemblance might be deeply misleading.