论文标题
数据科学工作者如何合作?角色,工作流和工具
How do Data Science Workers Collaborate? Roles, Workflows, and Tools
论文作者
论文摘要
如今,组织内部数据科学在组织内的突出性引起了数据科学工作者团队,他们合作从数据中提取见解,而不是单独工作的个别数据科学家。但是,我们仍然对数据科学工作者如何在实践中进行协作缺乏深入的了解。在这项工作中,我们对在数据科学各个方面工作的183名参与者进行了在线调查。我们专注于他们相互报道的相互作用(例如,工程师的经理)和不同的工具(例如,Jupyter Notebook)。我们发现,数据科学团队非常合作,并在数据科学工作流的六个常见步骤(例如,清洁数据和火车模型)的六个常见步骤中与各种利益相关者和工具合作。我们还发现,工人使用的协作实践(例如文档)会根据他们使用的各种工具而有所不同。基于这些发现,我们讨论了支持数据科学团队合作和未来研究方向的设计含义。
Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions.