• 机器学习反向传播的一些推导公式


    1.预备知识

    对矩阵求导的理解可以借鉴我们高中熟悉的导数,在高中的时候我们都是对标量求导,标量其实也可以看成是一种特殊的1*1的矩阵。本文主要是为了记录机器学习中反向传播的过程,所以不对矩阵求导做过多的分析(事实上是我也不会,只会简单的)。

    这里仅给出后面反向传播过程需要用到的一种矩阵求导的情形:
    ∂ ( a T x ) ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x = [ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 2 ⋮ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x n ] = [ a 1 a 2 ⋮ a n ] = a

    (aTx)x=(a1x1+a2x2++anxn)x=[(a1x1+a2x2++anxn)x1(a1x1+a2x2++anxn)x2(a1x1+a2x2++anxn)xn]=[a1a2an]=a" role="presentation" style="position: relative;">(aTx)x=(a1x1+a2x2++anxn)x=[(a1x1+a2x2++anxn)x1(a1x1+a2x2++anxn)x2(a1x1+a2x2++anxn)xn]=[a1a2an]=a
    x(aTx)=x(a1x1+a2x2++anxn)= x1(a1x1+a2x2++anxn)x2(a1x1+a2x2++anxn)xn(a1x1+a2x2++anxn) = a1a2an =a
    看懂这个我们就可以开始啦~

    2.反向传播

    在这里插入图片描述

    我们开始向后传播:

    隐藏层第2层:
    激活函数 : d a [ 2 ] = ∂ L ∂ a [ 2 ] 激活函数: da^{[2]}=\frac{\partial L}{\partial a^{[2]}} 激活函数:da[2]=a[2]L

    d z [ 2 ] = ∂ L ∂ z [ 2 ] = ∂ L ∂ a [ 2 ] ⋅ ∂ a [ 2 ] ∂ z [ 2 ] = d a [ 2 ] ⋅ g [ 2 ] ’ ( z [ 2 ] ) dz^{[2]}=\frac{\partial L}{\partial z^{[2]}}=\frac{\partial L}{\partial a^{[2]}}·\frac{\partial a^{[2]}}{\partial z^{[2]}}=da^{[2]}·g^{[2]’}(z^{[2]}) dz[2]=z[2]L=a[2]Lz[2]a[2]=da[2]g[2](z[2])

    d W [ 2 ] = ∂ L ∂ W [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ W [ 2 ] = d z [ 2 ] ⋅ a [ 1 ] T ⇒ W [ 2 ] − = α ⋅ d W [ 2 ] dW^{[2]}=\frac{\partial L}{\partial W^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial W^{[2]}}=dz^{[2]}·a^{[1]T} \\ \Rightarrow W^{[2]}-=α·dW^{[2]} dW[2]=W[2]L=z[2]LW[2]z[2]=dz[2]a[1]TW[2]=αdW[2]

    d b [ 2 ] = ∂ L ∂ b [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ b [ 2 ] = d z [ 2 ] ⇒ b [ 2 ] − = α ⋅ d b [ 2 ] db^{[2]}=\frac{\partial L}{\partial b^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial b^{[2]}}=dz^{[2]} \\ \Rightarrow b^{[2]}-=α·db^{[2]} db[2]=b[2]L=z[2]Lb[2]z[2]=dz[2]b[2]=αdb[2]

    隐藏层第1层:
    激活函数 : d a [ 1 ] = ∂ L ∂ a [ 1 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ a [ 1 ] = W [ 2 ] T ⋅ d z [ 2 ] 激活函数: da^{[1]}=\frac{\partial L}{\partial a^{[1]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial a^{[1]}}=W^{[2]T}·dz^{[2]} 激活函数:da[1]=a[1]L=z[2]La[1]z[2]=W[2]Tdz[2]
    说实话,这一步的计算结果我有点没懂:

    ∂ L ∂ z [ 2 ] \frac{\partial L}{\partial z^{[2]}} z[2]L d z [ 2 ] dz^{[2]} dz[2] ∂ z [ 2 ] ∂ a [ 1 ] \frac{\partial z^{[2]}}{\partial a^{[1]}} a[1]z[2] W [ 2 ] T W^{[2]T} W[2]T,为什么相乘的结果是 W [ 2 ] T ⋅ d z [ 2 ] W^{[2]T}·dz^{[2]} W[2]Tdz[2],而不是 d z [ 2 ] ⋅ W [ 2 ] T dz^{[2]}·W^{[2]T} dz[2]W[2]T

    d z [ 1 ] = ∂ L ∂ z [ 1 ] = ∂ L ∂ a [ 1 ] ⋅ ∂ a [ 1 ] ∂ z [ 1 ] = d a [ 1 ] ⋅ g [ 1 ] ’ ( z [ 1 ] ) dz^{[1]}=\frac{\partial L}{\partial z^{[1]}}=\frac{\partial L}{\partial a^{[1]}}·\frac{\partial a^{[1]}}{\partial z^{[1]}}=da^{[1]}·g^{[1]’}(z^{[1]}) dz[1]=z[1]L=a[1]Lz[1]a[1]=da[1]g[1](z[1])

    d W [ 1 ] = ∂ L ∂ W [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ W [ 1 ] = d z [ 1 ] ⋅ a [ 0 ] T ⇒ W [ 1 ] − = α ⋅ d W [ 1 ] dW^{[1]}=\frac{\partial L}{\partial W^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial W^{[1]}}=dz^{[1]}·a^{[0]T} \\ \Rightarrow W^{[1]}-=α·dW^{[1]} dW[1]=W[1]L=z[1]LW[1]z[1]=dz[1]a[0]TW[1]=αdW[1]

    d b [ 1 ] = ∂ L ∂ b [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ b [ 1 ] = d z [ 1 ] ⇒ b [ 1 ] − = α ⋅ d b [ 1 ] db^{[1]}=\frac{\partial L}{\partial b^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial b^{[1]}}=dz^{[1]} \\ \Rightarrow b^{[1]}-=α·db^{[1]} db[1]=b[1]L=z[1]Lb[1]z[1]=dz[1]b[1]=αdb[1]

    3.总结

    第l层:
    激活函数 : d a [ l ] = ∂ L ∂ a [ l ] = ∂ L ∂ z [ l + 1 ] ⋅ ∂ z [ l + 1 ] ∂ a [ l ] = W [ l + 1 ] T ⋅ d z [ l + 1 ] 激活函数: da^{[l]}=\frac{\partial L}{\partial a^{[l]}}=\frac{\partial L}{\partial z^{[l+1]}}·\frac{\partial z^{[l+1]}}{\partial a^{[l]}}=W^{[l+1]T}·dz^{[l+1]} 激活函数:da[l]=a[l]L=z[l+1]La[l]z[l+1]=W[l+1]Tdz[l+1]

    d z [ l ] = ∂ L ∂ z [ l ] = ∂ L ∂ a [ l ] ⋅ ∂ a [ l ] ∂ z [ l ] = d a [ l ] ⋅ g [ l ] ’ ( z [ l ] ) ⇒ d z [ l ] = W [ l + 1 ] T d z [ l + 1 ] ⋅ g [ l ] ’ ( z [ l ] ) dz^{[l]}=\frac{\partial L}{\partial z^{[l]}}=\frac{\partial L}{\partial a^{[l]}}·\frac{\partial a^{[l]}}{\partial z^{[l]}}=da^{[l]}·g^{[l]’}(z^{[l]}) \\ \Rightarrow dz^{[l]}=W^{[l+1]T}dz^{[l+1]}·g^{[l]’}(z^{[l]}) dz[l]=z[l]L=a[l]Lz[l]a[l]=da[l]g[l](z[l])dz[l]=W[l+1]Tdz[l+1]g[l](z[l])

    d W [ l ] = ∂ L ∂ W [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ W [ l ] = d z [ l ] ⋅ a [ l − 1 ] T ⇒ W [ l ] − = α ⋅ d W [ l ] dW^{[l]}=\frac{\partial L}{\partial W^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial W^{[l]}}=dz^{[l]}·a^{[l-1]T} \\ \Rightarrow W^{[l]}-=α·dW^{[l]} dW[l]=W[l]L=z[l]LW[l]z[l]=dz[l]a[l1]TW[l]=αdW[l]

    d b [ l ] = ∂ L ∂ b [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ b [ l ] = d z [ l ] ⇒ b [ l ] − = α ⋅ d b [ l ] db^{[l]}=\frac{\partial L}{\partial b^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial b^{[l]}}=dz^{[l]} \\ \Rightarrow b^{[l]}-=α·db^{[l]} db[l]=b[l]L=z[l]Lb[l]z[l]=dz[l]b[l]=αdb[l]

  • 相关阅读:
    Angular核心-父子间组件传递数据-重难点
    信安软考——第六章 认证技术原理和应用 笔记记录
    【Python】解决类中特性(property)覆盖同名属性(attribute)报错问题
    高校评教教师工作量管理系统设计与实现
    springboot电子阅览室app毕业设计源码016514
    【ARM Coresight 系列文章 9 -- ETM 介绍 1】
    影单:分享一下最近在看的一些电影
    k8s小白的学习初体验
    006_Nacos注册中心【Windows和Linux安装Nacos】
    雪花算法生成主键ID
  • 原文地址:https://blog.csdn.net/im34v/article/details/126064996