论文标题

n-最佳假设重新用于文本到SQL系统

N-Best Hypotheses Reranking for Text-To-SQL Systems

论文作者

Zeng, Lu, Parthasarathi, Sree Hari Krishnan, Hakkani-Tur, Dilek

论文摘要

文本到SQL任务将自然语言映射到可以发行到数据库的结构化查询。最先进的(SOTA)系统依赖于对大型的,预先训练的语言模型以及使用SQL解析器的约束解码。在建立良好的蜘蛛数据集上,我们从Oracle研究开始:具体来说,从SOTA模型的10好的列表中选择了Oracle假设,在精确匹配(EM)和执行精度(EX)精确度中,获得了$ 7.7 \%$的绝对改进,显示出具有重视的潜在势头。将连贯性和正确性确定为重新融合方法,我们设计了一个生成查询计划的模型,并提出了链接算法的启发式模式。将这两种方法与T5总数相结合,我们获得了EM精度的一致$ 1 \%$提高,并且EX的EX $ 〜2.5 \%$改进,为此任务建立了新的SOTA。我们对开发数据数据的全面错误研究表明,在此任务上取得进展的根本困难。

Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a $7.7\%$ absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent $1\% $ improvement in EM accuracy, and a $~2.5\%$ improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源