n-最佳假设重新用于文本到SQL系统

论文标题

n-最佳假设重新用于文本到SQL系统

N-Best Hypotheses Reranking for Text-To-SQL Systems

论文作者

Zeng, Lu, Parthasarathi, Sree Hari Krishnan, Hakkani-Tur, Dilek

论文摘要

文本到SQL任务将自然语言映射到可以发行到数据库的结构化查询。最先进的（SOTA）系统依赖于对大型的，预先训练的语言模型以及使用SQL解析器的约束解码。在建立良好的蜘蛛数据集上，我们从Oracle研究开始：具体来说，从SOTA模型的10好的列表中选择了Oracle假设，在精确匹配（EM）和执行精度（EX）精确度中，获得了$ 7.7 \％$的绝对改进，显示出具有重视的潜在势头。将连贯性和正确性确定为重新融合方法，我们设计了一个生成查询计划的模型，并提出了链接算法的启发式模式。将这两种方法与T5总数相结合，我们获得了EM精度的一致$ 1 \％$提高，并且EX的EX $ 〜2.5 \％$改进，为此任务建立了新的SOTA。我们对开发数据数据的全面错误研究表明，在此任务上取得进展的根本困难。

Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a $7.7\%$ absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent $1\% $ improvement in EM accuracy, and a $~2.5\%$ improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题