论文标题
正向或反向模式自动分化:有什么区别?
Forward- or Reverse-Mode Automatic Differentiation: What's the Difference?
论文作者
论文摘要
自动分化(AD)一直是许多学科研究人员感兴趣的话题,自从应用机器学习和神经网络以来,其受欢迎程度都会提高。尽管许多研究人员欣赏并知道如何应用广告,但真正了解基础过程仍然是一个挑战。但是,从代数的角度来看,AD似乎很自然:它源自分化法。在这项工作中,我们使用编程技术代数来推理不同的AD变体,利用Haskell来说明我们的观察结果。我们的发现源于三个基本代数摘要:(1)模块在半度性上的概念,(2)Nagata对“模块的理想化”的构建,以及(3)Kronecker的Delta功能,这使我们能够编写AD的单线抽象定义。从这个单线定义以及通过各种方式实例化代数结构,我们得出不同的AD变体,这些变体具有相同的伸展行为,但具有不同的强度属性,主要是在(渐近)计算复杂性方面。我们通过Kronecker同构表示了不同的变体,这是对我们的Haskell基础设施的进一步阐述,可确保通过构造正确性。有了这个框架,本文旨在使广告变体更加理解,从而对此事有代数的看法。
Automatic differentiation (AD) has been a topic of interest for researchers in many disciplines, with increased popularity since its application to machine learning and neural networks. Although many researchers appreciate and know how to apply AD, it remains a challenge to truly understand the underlying processes. From an algebraic point of view, however, AD appears surprisingly natural: it originates from the differentiation laws. In this work we use Algebra of Programming techniques to reason about different AD variants, leveraging Haskell to illustrate our observations. Our findings stem from three fundamental algebraic abstractions: (1) the notion of module over a semiring, (2) Nagata's construction of the 'idealization of a module', and (3) Kronecker's delta function, that together allow us to write a single-line abstract definition of AD. From this single-line definition, and by instantiating our algebraic structures in various ways, we derive different AD variants, that have the same extensional behaviour, but different intensional properties, mainly in terms of (asymptotic) computational complexity. We show the different variants equivalent by means of Kronecker isomorphisms, a further elaboration of our Haskell infrastructure which guarantees correctness by construction. With this framework in place, this paper seeks to make AD variants more comprehensible, taking an algebraic perspective on the matter.