论文标题
特征分层模型
Eigen-Stratified Models
论文作者
论文摘要
分层模型以任意方式取决于所选的分类功能,该功能占$ k $ values,并线性依赖于其他$ n $功能。在特征值上相对于图形的拉普拉斯正则化可以极大地提高分层模型的性能,尤其是在低数据模式下。拉普拉斯(Laplacian)规范化的分层模型的一个重要问题是,该模型是基本模型的大小$ K $倍,这可能很大。 我们通过制定分层的模型来解决此问题,该模型是分层模型,其额外的约束是,模型参数是图形laplacian的底部特征向量的某些适度数字$ m $的线性组合,即与$ M $ M $ small eigenValues相关的底部。使用特征分层的模型,我们只需要存储$ M $底部特征向量和相应的系数作为分层模型参数。当$ m \ leq n $和$ m \ ll k $时,这会导致型号的减小,有时甚至很大。在某些情况下,本征分层模型中隐含的附加正则化可以改善样本外的性能,而不是标准的拉普拉斯正规分层模型。
Stratified models depend in an arbitrary way on a selected categorical feature that takes $K$ values, and depend linearly on the other $n$ features. Laplacian regularization with respect to a graph on the feature values can greatly improve the performance of a stratified model, especially in the low-data regime. A significant issue with Laplacian-regularized stratified models is that the model is $K$ times the size of the base model, which can be quite large. We address this issue by formulating eigen-stratifed models, which are stratified models with an additional constraint that the model parameters are linear combinations of some modest number $m$ of bottom eigenvectors of the graph Laplacian, i.e., those associated with the $m$ smallest eigenvalues. With eigen-stratified models, we only need to store the $m$ bottom eigenvectors and the corresponding coefficients as the stratified model parameters. This leads to a reduction, sometimes large, of model size when $m \leq n$ and $m \ll K$. In some cases, the additional regularization implicit in eigen-stratified models can improve out-of-sample performance over standard Laplacian regularized stratified models.