论文标题
DPACK:面向效率的隐私预算计划
DPack: Efficiency-Oriented Privacy Budget Scheduling
论文作者
论文摘要
机器学习(ML)模型可以泄漏有关用户的信息,而差异隐私(DP)为在给定预算下的泄漏提供了一种严格的方式。该DP预算可以被视为在用户数据的多个ML模型培训的工作负载中的一种新型计算资源。一旦使用,DP预算将永远消耗。因此,对于训练尽可能多的模型,最有效地分配它是至关重要的。本文介绍了针对效率优化的隐私调度程序。我们将隐私计划作为一种新型的多维背包问题,称为“隐私选择”,可最大程度地提高DP预算效率。我们表明隐私背包是NP-HARD,因此实用的算法必然是近似的。我们开发了一种用于隐私背包,DPACK的近似算法,并在微型计算标准和我们从阿里巴巴ML ML群集跟踪中开发的新的合成的私有ML工作负载上进行了评估。我们表明DPACK:(1)经常处理效率 - 最佳时间表,(2)与最新的隐私计划算法相比,始终如一地安排了更多的任务,该算法的重点是公平(Alibaba,Alibaba的1.3-1.7倍,1.0-2.6倍,微生物标记1.0-2.6x),但(3)牺牲了一些公平的效率。因此,使用DPACK,DP ML操作员应该能够在相同数量的用户数据上训练更多模型,同时向其用户提供相同的隐私保证。
Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPack, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPack: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPack, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.