论文标题
估计执行链接基于遍历的SPARQL查询的成本
Estimating the Cost of Executing Link Traversal based SPARQL Queries
论文作者
论文摘要
越来越多的组织在几乎所有领域中都开始采用语义Web技术,以将其数据发布为开放,链接和互操作(RDF)数据集,可通过SPARQL语言和协议进行查询。 Link Traversal已成为一种SPARQL查询处理方法,该方法利用了链接的数据原理和Web的动态性质,以动态发现与查询评估期间解决在线资源(URI)回答查询相关的数据。但是,由于在查询执行过程中需要访问的资源数量,链接遍历查询的执行时间对于某些查询类型的执行时间可能会变得过高。在本文中,我们提出和评估基线方法,以估计链接遍历查询的评估成本。此类方法对于确定给定查询的查询执行策略非常有用,从而减少SPARQL端点的负载并提高查询服务的整体可靠性。为了评估所提出方法的性能,我们创建了(并公开)一个由2,425个查询组成的基础真相数据集。
An increasing number of organisations in almost all fields have started adopting semantic web technologies for publishing their data as open, linked and interoperable (RDF) datasets, queryable through the SPARQL language and protocol. Link traversal has emerged as a SPARQL query processing method that exploits the Linked Data principles and the dynamic nature of the Web to dynamically discover data relevant for answering a query by resolving online resources (URIs) during query evaluation. However, the execution time of link traversal queries can become prohibitively high for certain query types due to the high number of resources that need to be accessed during query execution. In this paper we propose and evaluate baseline methods for estimating the evaluation cost of link traversal queries. Such methods can be very useful for deciding on-the-fly the query execution strategy to follow for a given query, thereby reducing the load of a SPARQL endpoint and increasing the overall reliability of the query service. To evaluate the performance of the proposed methods, we have created (and make publicly available) a ground truth dataset consisting of 2,425 queries.