大型语言模型很少（1）示意桌子推理器

论文标题

大型语言模型很少（1）示意桌子推理器

Large Language Models are few(1)-shot Table Reasoners

论文作者

Chen, Wenhu

论文摘要

最近的文献表明，大型语言模型（LLMS）通常是解决文本推理任务的出色推理者。但是，LLM在表上推理任务上的功能尚未探索。在本文中，我们的目的是了解LLM可以通过几乎没有镜头的学习学习来执行与桌面相关的任务。具体而言，我们在流行的表质量质量质量检查和事实验证数据集上评估了LLM，例如WikiableQueStion，Fetaqa，Tabfact和Grofory，发现LLM在桌面结构上具有复杂的推理能力，尽管这些模型尚未在任何桌面上进行预培训。当与“思想链”提示结合使用时，LLM只能通过1次演示来实现非常强大的性能，即使与某些SOTA模型相当。我们表明，LLM比调谐的T5总数更有能力在Fetaqa上产生全面的长格式答案。我们进一步手动研究了LLMS引起的推理链，发现这些推理链与潜在的语义形式高度一致。我们认为，LLM可以作为未来研究的简单而通用的基准。代码和数据在https://github.com/wenhuchen/tablecot中发布。

Recent literature has shown that large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table reasoning tasks is yet to be explored. In this paper, we aim at understanding how well LLMs can perform table-related tasks with few-shot in-context learning. Specifically, we evaluated LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are competent at complex reasoning over table structures, though these models are not pre-trained on any table corpus. When combined with `chain of thoughts' prompting, LLMs can achieve very strong performance with only a 1-shot demonstration, even on par with some SoTA models. We show that LLMs are even more competent at generating comprehensive long-form answers on FetaQA than tuned T5-large. We further manually studied the reasoning chains elicited from LLMs and found that these reasoning chains are highly consistent with the underlying semantic form. We believe that LLMs can serve as a simple yet generic baseline for future research. The code and data are released in https://github.com/wenhuchen/TableCoT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题