RANKT5：对文本排名的微调T5和排名损失

论文标题

RANKT5：对文本排名的微调T5和排名损失

RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses

论文作者

Zhuang, Honglei, Qin, Zhen, Jagerman, Rolf, Hui, Kai, Ma, Ji, Lu, Jing, Ni, Jianmo, Wang, Xuanhui, Bendersky, Michael

论文摘要

最近，基于伯特（Bert）等验证的语言模型，在文本排名中取得了很大的进步。但是，关于如何利用更强大的序列到序列模型（例如T5）的研究有限。现有的尝试通常将文本排名作为分类，并依靠后处理来获得排名列表。在本文中，我们提出了RANKT5并研究两个基于T5的排名模型结构，一个编码器编码器和仅编码器的排名结构，这样他们不仅可以直接输出每个查询文档对的排名分数，而且可以通过“成对”或“ list”等级损失进行微调，以优化排名表现。我们的实验表明，具有排名损失的拟议模型可以在不同的公共文本排名数据集上获得可观的排名绩效提高。此外，与对分类损失进行微调的模型相比，当对列表排名损失进行微调时，排名模型在室外数据集上似乎具有更好的零照片排名性能。

Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts usually formulate text ranking as classification and rely on postprocessing to obtain a ranked list. In this paper, we propose RankT5 and study two T5-based ranking model structures, an encoder-decoder and an encoder-only one, so that they not only can directly output ranking scores for each query-document pair, but also can be fine-tuned with "pairwise" or "listwise" ranking losses to optimize ranking performances. Our experiments show that the proposed models with ranking losses can achieve substantial ranking performance gains on different public text ranking data sets. Moreover, when fine-tuned with listwise ranking losses, the ranking model appears to have better zero-shot ranking performance on out-of-domain data sets compared to the model fine-tuned with classification losses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题