线性代数实现p4
本系列文章是我学习李沐老师深度学习系列课程的学习笔记,可能会对李沐老师上课没讲到的进行补充。
本节是第二篇
此处参考了视频:矩阵的导数运算
为了方便看出区别,我将所有的向量都不按印刷体加粗,而是按手写体在向量对应字母上加箭头的方式展现。
在一元函数中,求一个函数的极值点,一般令导数为0(该点切线斜率为0),求得驻点,最后通过极值点定义或推论判断其是否为极值点,也就是如下过程:

求多元函数极值的方法如下:

(这个图中给的自变量记成了
y
y
y,实际上记成
x
x
x更顺眼)
假设这个多元函数有
m
m
m个变量,即
f
(
x
1
,
x
2
,
.
.
.
,
x
m
)
f(x_{1},x_{2},...,x_{m})
f(x1,x2,...,xm),那么求其极值的偏导数方程组中的方程就有
m
m
m个,这样写起来有一些麻烦,于是我们将用一种简洁的方式表达它,我们将所有这
m
m
m个变量写成一个列向量的形式即
x
→
=
[
x
1
x
2
⋮
x
m
]
m
×
1
\overrightarrow x=[x1x2⋮xm]_{m\times 1}
x=
x1x2⋮xm
m×1,此时我们将多元函数
f
(
x
1
,
x
2
,
.
.
.
,
x
m
)
f(x_{1},x_{2},...,x_{m})
f(x1,x2,...,xm)转化为一个自变量是一个向量的方程即
f
(
x
→
)
f(\overrightarrow x)
f(x)
【注意】此处
x
→
\overrightarrow x
x是一个由多个自变量汇总而成的
m
m
m维列向量(
m
×
1
m\times 1
m×1),而
f
(
x
→
)
f(\overrightarrow x)
f(x)是函数值,是一个标量,所以对其求偏导数就是标量对向量求导。
此时我们可以定义标量方程对向量的偏导数形式(有两种)为:
(1)分母布局(Denominator Layout):
∂
f
(
x
→
)
∂
x
→
=
[
∂
f
(
x
→
)
∂
x
1
∂
f
(
x
→
)
∂
x
2
⋮
∂
f
(
x
→
)
∂
x
m
]
m
×
1
\frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =[∂f(→x)∂x1∂f(→x)∂x2⋮∂f(→x)∂xm]_{m\times 1}
∂x∂f(x)=
∂x1∂f(x)∂x2∂f(x)⋮∂xm∂f(x)
m×1
其中,
∂
f
(
x
→
)
∂
x
→
\frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x}
∂x∂f(x)为
m
×
1
m\times 1
m×1的列向量。
(2)分子布局(Numerator Layout):
∂
f
(
x
→
)
∂
x
→
=
[
∂
f
(
x
→
)
∂
x
1
,
∂
f
(
x
→
)
∂
x
2
,
…
,
∂
f
(
x
→
)
∂
x
m
]
1
×
m
\frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =[∂f(→x)∂x1,∂f(→x)∂x2,…,∂f(→x)∂xm]_{1\times m}
∂x∂f(x)=[∂x1∂f(x),∂x2∂f(x),…,∂xm∂f(x)]1×m
其中,
∂
f
(
x
→
)
∂
x
→
\frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x}
∂x∂f(x)为
1
×
m
1\times m
1×m的行向量。
不同的资料采用的布局不一样,分子布局与分母布局互为转置,虽然在李沐老师的课程中标量对向量的导数采用了分子布局,但是为了方便推导一些结论,我们采用分母布局,注意分母布局和分子布局的结论互为转置。
【例】已知
f
(
x
1
,
x
2
)
=
x
1
2
+
x
2
2
f(x_{1},x_{2})=x_{1}^{2}+x_{2}^{2}
f(x1,x2)=x12+x22,其中
x
→
=
[
x
1
x
2
]
\overrightarrow x=[x1x2]
x=[x1x2],求
∂
f
(
x
→
)
∂
x
→
\frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x}
∂x∂f(x)
【答】
∂
f
(
x
→
)
∂
x
→
=
[
∂
f
(
x
→
)
∂
x
1
∂
f
(
x
→
)
∂
x
2
]
=
[
2
x
1
2
x
2
]
\frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =[∂f(→x)∂x1∂f(→x)∂x2]=[2x12x2]
∂x∂f(x)=[∂x1∂f(x)∂x2∂f(x)]=[2x12x2]
设有如下函数,它本身就是一个向量,然后它的自变量也是向量(由多个自变量组成的向量),即:
f
→
(
x
→
)
=
[
f
1
(
x
→
)
f
2
(
x
→
)
⋮
f
n
(
x
→
)
]
n
×
1
,
x
→
=
[
x
1
x
2
⋮
x
m
]
\overrightarrow{f}(\overrightarrow x)=[f1(→x)f2(→x)⋮fn(→x)]_{n\times 1},\overrightarrow x=[x1x2⋮xm]
f(x)=
f1(x)f2(x)⋮fn(x)
n×1,x=
x1x2⋮xm
其中,
f
→
(
x
→
)
\overrightarrow{f}(\overrightarrow x)
f(x)是一个
n
×
1
n\times 1
n×1的列向量,
x
→
\overrightarrow x
x是一个
m
×
1
m\times 1
m×1的列向量。
此时我们将其偏导数形式定义为:
(1)分母布局:
∂
f
→
(
x
→
)
n
×
1
∂
x
→
m
×
1
=
[
∂
f
(
x
→
)
∂
x
1
∂
f
(
x
→
)
∂
x
2
⋮
∂
f
(
x
→
)
∂
x
m
]
=
[
∂
f
1
(
x
→
)
∂
x
1
∂
f
2
(
x
→
)
∂
x
1
…
∂
f
n
(
x
→
)
∂
x
1
∂
f
1
(
x
→
)
∂
x
2
∂
f
2
(
x
→
)
∂
x
2
…
∂
f
n
(
x
→
)
∂
x
2
⋮
⋮
⋱
⋮
∂
f
1
(
x
→
)
∂
x
m
∂
f
2
(
x
→
)
∂
x
m
…
∂
f
n
(
x
→
)
∂
x
m
]
m
×
n
\frac{\partial {\overrightarrow{f}(\overrightarrow x)}_{n\times 1}}{\partial\overrightarrow x_{m\times 1}} =[∂f(→x)∂x1∂f(→x)∂x2⋮∂f(→x)∂xm]=[∂f1(→x)∂x1∂f2(→x)∂x1…∂fn(→x)∂x1∂f1(→x)∂x2∂f2(→x)∂x2…∂fn(→x)∂x2⋮⋮⋱⋮∂f1(→x)∂xm∂f2(→x)∂xm…∂fn(→x)∂xm]_{m\times n}
∂xm×1∂f(x)n×1=
∂x1∂f(x)∂x2∂f(x)⋮∂xm∂f(x)
=
∂x1∂f1(x)∂x2∂f1(x)⋮∂xm∂f1(x)∂x1∂f2(x)∂x2∂f2(x)⋮∂xm∂f2(x)……⋱…∂x1∂fn(x)∂x2∂fn(x)⋮∂xm∂fn(x)
m×n
(2)分子布局:
∂
f
→
(
x
→
)
n
×
1
∂
x
→
m
×
1
=
[
∂
f
1
(
x
→
)
∂
x
→
∂
f
2
(
x
→
)
∂
x
→
…
∂
f
n
(
x
→
)
∂
x
→
]
=
[
∂
f
1
(
x
→
)
∂
x
1
∂
f
1
(
x
→
)
∂
x
2
…
∂
f
1
(
x
→
)
∂
x
m
∂
f
2
(
x
→
)
∂
x
1
∂
f
2
(
x
→
)
∂
x
2
…
∂
f
2
(
x
→
)
∂
x
m
⋮
⋮
⋱
⋮
∂
f
n
(
x
→
)
∂
x
1
∂
f
n
(
x
→
)
∂
x
2
…
∂
f
n
(
x
→
)
∂
x
m
]
n
×
m
\frac{\partial {\overrightarrow{f}(\overrightarrow x)}_{n\times 1}}{\partial\overrightarrow x_{m\times 1}} =[∂f1(→x)∂→x∂f2(→x)∂→x…∂fn(→x)∂→x]=[∂f1(→x)∂x1∂f1(→x)∂x2…∂f1(→x)∂xm∂f2(→x)∂x1∂f2(→x)∂x2…∂f2(→x)∂xm⋮⋮⋱⋮∂fn(→x)∂x1∂fn(→x)∂x2…∂fn(→x)∂xm]_{n\times m}
∂xm×1∂f(x)n×1=
∂x∂f1(x)∂x∂f2(x)…∂x∂fn(x)
=
∂x1∂f1(x)∂x1∂f2(x)⋮∂x1∂fn(x)∂x2∂f1(x)∂x2∂f2(x)⋮∂x2∂fn(x)……⋱…∂xm∂f1(x)∂xm∂f2(x)⋮∂xm∂fn(x)
n×m
【例】已知
f
→
(
x
→
)
=
[
f
1
(
x
→
)
f
2
(
x
→
)
]
=
[
x
1
2
+
x
2
2
+
x
3
x
3
2
+
2
x
1
]
2
×
1
\overrightarrow{f}(\overrightarrow x)=[f1(→x)f2(→x)]=[x21+x22+x3x23+2x1]_{2\times 1}
f(x)=[f1(x)f2(x)]=[x12+x22+x3x32+2x1]2×1,
x
→
=
[
x
1
x
2
x
3
]
\overrightarrow {x}=[x1x2x3]
x=
x1x2x3
,求
∂
f
→
(
x
→
)
∂
x
→
\frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x}
∂x∂f(x)
【答】按分母布局:
∂
f
→
(
x
→
)
∂
x
→
=
[
∂
f
(
x
→
)
∂
x
1
∂
f
(
x
→
)
∂
x
2
∂
f
(
x
→
)
∂
x
3
]
=
[
∂
f
1
(
x
→
)
∂
x
1
∂
f
2
(
x
→
)
∂
x
1
∂
f
1
(
x
→
)
∂
x
2
∂
f
2
(
x
→
)
∂
x
2
∂
f
1
(
x
→
)
∂
x
3
∂
f
2
(
x
→
)
∂
x
3
]
=
[
2
x
1
2
2
x
2
0
1
2
x
3
]
\frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x}=[∂f(→x)∂x1∂f(→x)∂x2∂f(→x)∂x3]=[∂f1(→x)∂x1∂f2(→x)∂x1∂f1(→x)∂x2∂f2(→x)∂x2∂f1(→x)∂x3∂f2(→x)∂x3]=[2x122x2012x3]
∂x∂f(x)=
∂x1∂f(x)∂x2∂f(x)∂x3∂f(x)
=
∂x1∂f1(x)∂x2∂f1(x)∂x3∂f1(x)∂x1∂f2(x)∂x2∂f2(x)∂x3∂f2(x)
=
2x12x21202x3
按分子布局:
∂
f
→
(
x
→
)
∂
x
→
=
[
∂
f
1
(
x
→
)
∂
x
→
∂
f
2
(
x
→
)
∂
x
→
]
=
[
∂
f
1
(
x
→
)
∂
x
1
∂
f
1
(
x
→
)
∂
x
2
∂
f
1
(
x
→
)
∂
x
3
∂
f
2
(
x
→
)
∂
x
1
∂
f
2
(
x
→
)
∂
x
2
∂
f
2
(
x
→
)
∂
x
3
]
=
[
2
x
1
2
x
2
1
2
0
2
x
3
]
\frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x} =[∂f1(→x)∂→x∂f2(→x)∂→x]=[∂f1(→x)∂x1∂f1(→x)∂x2∂f1(→x)∂x3∂f2(→x)∂x1∂f2(→x)∂x2∂f2(→x)∂x3]=[2x12x21202x3]
∂x∂f(x)=[∂x∂f1(x)∂x∂f2(x)]=[∂x1∂f1(x)∂x1∂f2(x)∂x2∂f1(x)∂x2∂f2(x)∂x3∂f1(x)∂x3∂f2(x)]=[2x122x2012x3]