论文标题

polyframe:一种可重新定位的基于查询的方法缩放数据范围的方法(扩展版本)

PolyFrame: A Retargetable Query-based Approach to Scaling DataFrames (Extended Version)

论文作者

Sinthong, Phanwadee, Carey, Michael J.

论文摘要

在过去的几年中,随着各种企业采用统计和机器学习技术来增强其决策和应用的能力,数据科学领域一直在迅速发展。扩展数据分析,可能包括将自定义机器学习模型应用于大量数据的应用,需要利用分布式框架。这可能会导致数据分析师面临严重的技术挑战并降低其生产率。 Python数据分析库Aframe被用作Apache AsterixDB之上的一层,通过合并数据科学家的开发环境来解决这些问题,并通过大数据管理系统透明地扩展了对分析操作的评估。尽管Aframe能够利用数据管理设施(例如索引和查询优化),并允许用户与大量数据进行交互,但初始版本仅生成SQL ++查询,并且仅针对Apache AsterixDB进行操作。在这项工作中,我们描述了一种新的设计,该设计也将Aframe的增量查询形成重新制定到其他基于查询的数据库系统,从而使其更灵活,可以针对具有可复合查询语言的其他数据管理系统进行部署。

In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision making and applications. Scaling data analysis, possibly including the application of custom machine learning models, to large volumes of data requires the utilization of distributed frameworks. This can lead to serious technical challenges for data analysts and reduce their productivity. AFrame, a Python data analytics library, is implemented as a layer on top of Apache AsterixDB, addressing these issues by incorporating the data scientists' development environment and transparently scaling out the evaluation of analytical operations through a Big Data management system. While AFrame is able to leverage data management facilities (e.g., indexes and query optimization) and allows users to interact with a very large volume of data, the initial version only generated SQL++ queries and only operated against Apache AsterixDB. In this work, we describe a new design that retargets AFrame's incremental query formation to other query-based database systems as well, making it more flexible for deployment against other data management systems with composable query languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源