论文标题

themis开放世界数据库系统中的示例辩论(扩展版本)

Sample Debiasing in the Themis Open World Database System (Extended Version)

论文作者

Orr, Laurel, Balazinska, Magda, Suciu, Dan

论文摘要

开放世界数据库管理系统假设数据库中不存在的元素仍然存在,并且正在成为越来越重要的研究领域。我们提出了Themis,这是第一个自动重新平衡任意偏见样本的开放世界数据库,以大约回答查询,就好像它们是在整个人群中发行的一样。我们利用APRIORI人群汇总信息来开发和结合两种自动偏差的方法:样本重量重量和贝叶斯网络概率建模。我们构建了一个主题的原型,并证明Themis具有比默认AQP方法,替代样品重新权技术以及各种贝叶斯网络模型的较高查询准确性,同时保持交互式查询响应时间。我们还表明,\名称对于样本和人口之间的支持差异是强大的,这是使用社交媒体样本时的关键用例。

Open world database management systems assume tuples not in the database still exist and are becoming an increasingly important area of research. We present Themis, the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population. We leverage apriori population aggregate information to develop and combine two different approaches for automatic debiasing: sample reweighting and Bayesian network probabilistic modeling. We build a prototype of Themis and demonstrate that Themis achieves higher query accuracy than the default AQP approach, an alternative sample reweighting technique, and a variety of Bayesian network models while maintaining interactive query response times. We also show that \name is robust to differences in the support between the sample and population, a key use case when using social media samples.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源