论文标题
从数十亿个数据点开始,以高效高斯流程回归的额定级别表示
Equispaced Fourier representations for efficient Gaussian process regression from a billion data points
论文作者
论文摘要
我们引入了一种基于傅立叶的快速算法,用于低维度的高斯过程回归。它近似于$ m $ m $ nodes的皇家频率网格上的复杂指数,近似于翻译不变的协方差内核。这会导致带有Toeplitz结构的重量空间$ M $ M $系统矩阵,因此可以通过快速傅立叶变换(FFT)将其应用于$ {\ Mathcal O}(M \ Mathcal O}(M \ log {M})$操作,与数据点$ N $无关。线性系统可以在$ {\ Mathcal O}(n + m \ log {m})$使用Unrisur FFT中设置。这可以通过迭代求解器进行有效的大规模回归,即使对于具有脂肪尾部密度的内核(大$ M $)也是如此。我们在内核近似和后平均误差上提供边界。在一个,两个和三个维度中,在一个,两个维度和三个维度中的平方指数和Matérn内核的数值实验通常以可比的精度比最先进的等级结构求解器上的1-2个数量级加速度。我们的方法允许2DMatérn-$ \ mbox {$ \ frac {3} {2} $} $从$ n = 10^9 $数据点在2分钟内在标准台式机上执行,并具有后端桌面,具有后级均值精度$ 10^{ - 3} $。这为空间统计应用程序打开了比以前可能大的100倍。
We introduce a Fourier-based fast algorithm for Gaussian process regression in low dimensions. It approximates a translationally-invariant covariance kernel by complex exponentials on an equispaced Cartesian frequency grid of $M$ nodes. This results in a weight-space $M\times M$ system matrix with Toeplitz structure, which can thus be applied to a vector in ${\mathcal O}(M \log{M})$ operations via the fast Fourier transform (FFT), independent of the number of data points $N$. The linear system can be set up in ${\mathcal O}(N + M \log{M})$ operations using nonuniform FFTs. This enables efficient massive-scale regression via an iterative solver, even for kernels with fat-tailed spectral densities (large $M$). We provide bounds on both kernel approximation and posterior mean errors. Numerical experiments for squared-exponential and Matérn kernels in one, two and three dimensions often show 1-2 orders of magnitude acceleration over state-of-the-art rank-structured solvers at comparable accuracy. Our method allows 2D Matérn-$\mbox{$\frac{3}{2}$}$ regression from $N=10^9$ data points to be performed in 2 minutes on a standard desktop, with posterior mean accuracy $10^{-3}$. This opens up spatial statistics applications 100 times larger than previously possible.