论文标题
IELM:预训练语言模型的开放信息提取基准
IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models
论文作者
论文摘要
我们为预训练的语言模型(LM)引入了新的开放信息提取(OIE)基准。最近的研究表明,预先培训的LMS(例如BERT和GPT)可能会存储语言和关系知识。特别是,当给出预定义的关系类别时,LMS能够回答``填写''问题。我们没有专注于预定义的关系,而是创建了一个OIE基准,旨在充分检查预训练的LMS中存在的开放关系信息。我们通过将预先训练的LMS变成零拍摄的OIE系统来实现这一目标。令人惊讶的是,预先训练的LMS能够在标准OIE数据集(CARB和RE-OIE2016)和两个新的大规模事实数据集(TAC KBP-OIE和Wikidata-OIE)上获得竞争性能,我们通过远距离的监督建立。例如,零射击预训练的LMS优于我们事实的OIE数据集中最先进的OIE方法的F1分数,而无需使用任何培训集。我们的代码和数据集可从https://github.com/cgraywang/ielm获得
We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM). Recent studies have demonstrated that pre-trained LMs, such as BERT and GPT, may store linguistic and relational knowledge. In particular, LMs are able to answer ``fill-in-the-blank'' questions when given a pre-defined relation category. Instead of focusing on pre-defined relations, we create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. We accomplish this by turning pre-trained LMs into zero-shot OIE systems. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets (CaRB and Re-OIE2016) and two new large-scale factual OIE datasets (TAC KBP-OIE and Wikidata-OIE) that we establish via distant supervision. For instance, the zero-shot pre-trained LMs outperform the F1 score of the state-of-the-art supervised OIE methods on our factual OIE datasets without needing to use any training sets. Our code and datasets are available at https://github.com/cgraywang/IELM