实际中,我们不知道一些样本对应的真实分布。比如我们手上有
N
p
N_p
Np个样本
{
h
n
}
n
=
1
N
p
\{ \boldsymbol h_n \}_{n=1}^{N_p}
{hn}n=1Np,这些样本都对应到某个特定的label,比如特定的数字或者类。令真实的分布为
f
(
h
)
f(\boldsymbol h)
f(h),我们可以通过经验产生大量样本来近似该分布。定义
f
f
f的近似为
p
(
h
)
p(\boldsymbol h)
p(h),并且限制分布
p
p
p为Gaussian Mixture, i.e.
p
(
h
)
=
∑
k
=
1
K
α
k
C
N
(
h
;
μ
k
,
C
k
)
(20)
p \left (\boldsymbol h \right) = \sum_{k=1}^K \alpha_k \mathcal{CN}(\boldsymbol h; \boldsymbol \mu_k, \boldsymbol{C}_k) \tag{20}
p(h)=k=1∑KαkCN(h;μk,Ck)(20)
其中
h
∈
C
L
×
1
\boldsymbol h \in \mathbb{C}^{L \times 1}
h∈CL×1,
μ
k
∈
C
L
×
1
\boldsymbol{\mu_k} \in \mathbb{C}^{L \times 1}
μk∈CL×1,
C
k
∈
C
L
×
L
\boldsymbol{C}_k \in \mathbb{C}^{L \times L}
Ck∈CL×L。假设我们固定第
p
p
p个标识label,然后经验生成
N
p
N_p
Np个样本,记为
{
h
n
}
n
=
1
N
p
\{ \boldsymbol h_n \}_{n=1}^{N_p}
{hn}n=1Np,我们要最大化似然函数:
max
{
α
k
,
μ
k
,
C
k
}
ln
(
∏
n
=
1
N
p
∑
k
=
1
K
α
k
C
N
(
h
n
;
μ
k
,
C
k
)
)
=
∑
n
=
1
N
p
ln
(
∑
k
=
1
K
α
k
C
N
(
h
n
;
μ
k
,
C
k
)
)
(21)
我们采用EM算法进行相关参数的估计。正式计算之前,先引入一个隐变量
z
∈
{
1
,
⋯
,
K
}
z \in \{1,\cdots,K\}
z∈{1,⋯,K}表征GMM中的第几个模(mode)。当
K
→
∞
K \rightarrow \infty
K→∞时,我们认为GMM能够很好地逼近原始分布。那么反过来,等效地来看,引入的隐变量
z
z
z可以解释为:先根据隐变量
z
z
z的先验
p
(
z
=
k
)
p(z=k)
p(z=k)确定第
k
k
k个mode,再基于似然
p
(
h
n
∣
z
n
=
k
)
p(\boldsymbol h_n|z^n=k)
p(hn∣zn=k),由第
k
k
k个mode来产生相应的样本生成(sample)。根据相关概率公式,我们进一步描述其概率意义:
k-th mode:
p
(
h
n
,
z
n
=
k
)
=
p
(
z
n
=
k
)
p
(
h
n
∣
z
n
=
k
)
all modes:
p
(
h
n
)
=
∑
z
p
(
h
n
,
z
n
=
k
)
=
∑
z
p
(
z
n
=
k
)
p
(
h
n
∣
z
n
=
k
)
posterior of k-th mode:
p
(
z
n
=
k
∣
h
n
)
=
p
(
z
n
=
k
)
p
(
h
n
∣
z
n
=
k
)
p
(
h
n
)
=
p
(
z
n
=
k
)
p
(
h
n
∣
z
n
=
k
)
∑
z
p
(
z
n
=
k
)
p
(
h
n
∣
z
n
=
k
)
我们将EM算法的步骤描述为:对于第
t
+
1
t+1
t+1次迭代
(1) E步:固定第
t
t
t次迭代的参数
{
α
k
t
,
μ
k
t
,
C
k
t
}
k
=
1
K
\{\alpha^t_k, \boldsymbol \mu^t_k, \boldsymbol{C}^t_k \}_{k=1}^K
{αkt,μkt,Ckt}k=1K,对
∀
n
,
k
\forall n,k
∀n,k,即第
n
n
n个sample,第
k
k
k个mode,计算后验后验分布,定义为
γ
n
k
t
+
1
≐
p
(
z
n
=
k
∣
h
n
)
=
p
(
z
n
=
k
)
p
(
h
n
∣
z
n
=
k
)
p
(
h
n
)
=
α
k
t
C
N
(
h
n
;
μ
k
t
,
C
k
t
)
∑
k
=
1
K
α
k
t
C
N
(
h
n
;
μ
k
t
,
C
k
t
)
,
∀
n
,
k
(22)
经过E步,我们便得到了一组后验分布 { γ n k t + 1 } n , k \{\gamma^{t+1}_{nk}\}_{n,k} {γnkt+1}n,k
(2) M步:固定后验分布 { γ n k t + 1 } n , k \{\gamma^{t+1}_{nk}\}_{n,k} {γnkt+1}n,k,通过最大化证据下界(Evidence Lower BOund)(即对数边际似然 log p ( h ; α , μ , C ) \log p(\boldsymbol h;\alpha, \boldsymbol \mu, \boldsymbol{C}) logp(h;α,μ,C)的下界),来估计第 t + 1 t+1 t+1次迭代的参数 { α k t + 1 , μ k t + 1 , C k t + 1 } k = 1 K \{ \alpha^{t+1}_k, \boldsymbol \mu^{t+1}_k, \boldsymbol{C}^{t+1}_k \}_{k=1}^K {αkt+1,μkt+1,Ckt+1}k=1K
E
L
B
O
(
{
γ
n
k
t
+
1
}
n
,
k
,
{
h
n
}
n
;
{
α
k
t
+
1
,
μ
k
t
+
1
,
C
k
t
+
1
}
k
=
1
K
)
=
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
log
p
(
h
n
,
z
n
=
k
)
γ
n
k
t
+
1
=
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
log
p
(
z
n
=
k
)
p
(
h
n
∣
z
n
=
k
)
γ
n
k
t
+
1
=
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
log
α
k
t
+
1
⋅
C
N
(
h
n
;
μ
k
t
+
1
,
C
k
t
+
1
)
γ
n
k
t
+
1
=
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
(
−
∥
(
C
k
t
+
1
)
−
1
2
(
h
n
−
μ
k
t
+
1
)
∥
2
2
+
log
α
k
t
+
1
−
log
∣
C
k
t
+
1
∣
)
+
C
(23)
其中
C
C
C是与
{
α
k
t
+
1
,
μ
k
t
+
1
,
C
k
t
+
1
}
k
=
1
K
\{ \alpha^{t+1}_k, \boldsymbol \mu^{t+1}_k, \boldsymbol{C}^{t+1}_k \}_{k=1}^K
{αkt+1,μkt+1,Ckt+1}k=1K无关的常数。上述最大化ELBO转化为优化问题:
arg max
{
α
k
t
+
1
,
μ
k
t
+
1
,
C
k
t
+
1
}
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
(
−
∥
(
C
k
t
+
1
)
−
1
2
(
h
n
−
μ
k
t
+
1
)
∥
2
2
+
log
α
k
t
+
1
−
log
∣
C
k
t
+
1
∣
)
s.t.
∑
k
=
1
K
α
k
=
1
(24)
利用拉格朗日函数:
g
:
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
(
−
∥
(
C
k
t
+
1
)
−
1
2
(
h
n
−
μ
k
t
+
1
)
∥
2
2
+
log
α
k
t
+
1
−
log
∣
C
k
t
+
1
∣
)
−
λ
(
∑
k
=
1
K
α
k
−
1
)
(25)
g: \sum_{n=1}^{N_p} \sum_{k=1}^K \gamma^{t+1}_{nk} \left ( - \left \Vert (\boldsymbol{C}^{t+1}_k)^{-\frac{1}{2}} \left ( \boldsymbol h_n - \boldsymbol{\mu}^{t+1}_k \right) \right \Vert^2_2 + \log \alpha^{t+1}_k - \log \left \vert \boldsymbol{{C}}^{t+1}_k \right \vert \right) - \lambda (\sum_{k=1}^K \alpha_k - 1) \tag{25}
g:n=1∑Npk=1∑Kγnkt+1(−∥∥∥(Ckt+1)−21(hn−μkt+1)∥∥∥22+logαkt+1−log∣∣Ckt+1∣∣)−λ(k=1∑Kαk−1)(25)
求关于
{
α
k
t
+
1
,
μ
k
t
+
1
,
C
k
t
+
1
}
\{ \alpha^{t+1}_k, \boldsymbol \mu^{t+1}_k, \boldsymbol{C}^{t+1}_k \}
{αkt+1,μkt+1,Ckt+1}的偏导数:
∂
g
∂
α
k
t
+
1
=
0
∂
g
∂
μ
k
t
+
1
=
0
∂
g
∂
C
k
t
+
1
=
0
(26)
在实际计算时,为了简化,可以取
C
k
=
σ
k
2
I
\boldsymbol{C}_k=\sigma^2_k \boldsymbol{I}
Ck=σk2I,即等高线为圆的高斯,这时式(25)可以化简为
g
:
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
(
−
1
σ
k
2
∥
(
h
n
−
μ
k
t
+
1
)
∥
2
2
+
log
α
k
t
+
1
−
2
N
U
E
N
B
S
log
σ
k
)
−
λ
(
∑
k
=
1
K
α
k
−
1
)
g: \sum_{n=1}^{N_p} \sum_{k=1}^K \gamma^{t+1}_{nk} \left ( - \frac{1}{\sigma^2_k} \left \Vert \left (\boldsymbol h_n- \boldsymbol{\mu}^{t+1}_k \right) \right \Vert^2_2 + \log \alpha^{t+1}_k - 2 N_{UE} N_{BS} \log \sigma_k \right) - \lambda (\sum_{k=1}^K \alpha_k - 1)
g:n=1∑Npk=1∑Kγnkt+1(−σk21∥∥(hn−μkt+1)∥∥22+logαkt+1−2NUENBSlogσk)−λ(k=1∑Kαk−1)
将式(26)代入到上式
首先对
α
k
\alpha_k
αk求导
∂
g
∂
α
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
⋅
1
α
k
−
λ
=
0
⇒
α
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
λ
又因为
∑
k
=
1
K
α
k
=
1
\sum_{k=1}^K \alpha_k = 1
∑k=1Kαk=1,因此
λ
=
N
p
\lambda=N_p
λ=Np,故
α
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
N
p
\alpha_k = \frac{\sum_{n=1}^{N_p} \gamma^{t+1}_{nk}}{ N_p }
αk=Np∑n=1Npγnkt+1
然后对
μ
k
∈
C
L
×
1
\boldsymbol{\mu}_k \in \mathbb{C}^{L \times 1}
μk∈CL×1求导
∂
g
∂
μ
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
⋅
(
−
1
σ
k
2
)
2
(
μ
k
−
h
n
)
=
0
⇒
∑
n
=
1
N
p
γ
n
k
t
+
1
⋅
(
μ
k
−
h
n
)
=
0
⇒
μ
k
⋅
∑
n
=
1
N
p
γ
n
k
t
+
1
=
∑
n
=
1
N
p
γ
n
k
t
+
1
h
n
⇒
μ
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
h
n
∑
n
=
1
N
p
γ
n
k
t
+
1
最后对
σ
k
\sigma_k
σk求导
∂
g
∂
σ
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
(
1
σ
k
3
∥
(
h
n
−
μ
k
t
+
1
)
∥
2
2
−
2
N
U
E
N
B
S
1
σ
k
)
=
0
⇒
σ
k
2
=
1
2
N
U
E
N
B
S
⋅
∑
n
=
1
N
p
γ
n
k
t
+
1
∥
(
h
n
−
μ
k
t
+
1
)
∥
2
2
∑
n
=
1
N
p
γ
n
k
t
+
1
首先对
α
k
\alpha_k
αk求导
∂
g
∂
α
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
⋅
1
α
k
−
λ
=
0
⇒
α
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
λ
又因为 ∑ k = 1 K α k = 1 \sum_{k=1}^K \alpha_k = 1 ∑k=1Kαk=1,因此 λ = N p \lambda=N_p λ=Np
然后对
μ
k
∗
∈
C
L
×
1
\boldsymbol{\mu}^{*}_k \in \mathbb{C}^{L \times 1}
μk∗∈CL×1求导,根据公式(复矩阵求导):
∂
(
b
H
x
)
∂
x
∗
=
0
,
∂
(
x
H
b
)
∂
x
∗
=
b
∂
(
x
H
A
x
)
∂
x
∗
=
A
x
进一步,
∂
g
∂
μ
k
∗
=
∂
∂
μ
k
∗
∑
n
=
1
N
p
−
γ
n
k
t
+
1
∥
(
C
k
t
+
1
)
−
1
2
(
h
n
−
μ
k
t
+
1
)
∥
2
2
=
∑
n
=
1
N
p
−
γ
n
k
t
+
1
∂
∂
μ
k
∗
(
h
n
−
μ
k
t
+
1
)
H
(
C
k
t
+
1
)
−
1
(
h
n
−
μ
k
t
+
1
)
=
∑
n
=
1
N
p
γ
n
k
t
+
1
(
C
k
t
+
1
)
−
1
(
μ
k
t
+
1
−
h
n
)
=
(
C
k
t
+
1
)
−
1
∑
n
=
1
N
p
γ
n
k
t
+
1
(
μ
k
t
+
1
−
h
n
)
=
0
⇒
μ
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
h
n
∑
n
=
1
N
p
γ
n
k
t
+
1
最后对 C k \boldsymbol C_k Ck求导,但是实际中,为了方便处理,我们是对 ( C k − 1 ) ∗ (\boldsymbol C^{-1}_k)^* (Ck−1)∗进行求导,在求导之前,我们回顾一些基本概念。
回顾:如果 A ∈ C N × N \boldsymbol{A} \in \mathbb{C}^{N \times N} A∈CN×N是正定矩阵,那么根据 det ( A ) = ∏ n λ n \text{det}(\boldsymbol{A})=\prod_{n} \lambda_n det(A)=∏nλn,我们可以得到 det ( A ) = det ( A ∗ ) = det ( A T ) = det ( A H ) \text{det}(\boldsymbol{A})=\text{det}(\boldsymbol{A}^*)=\text{det}(\boldsymbol{A}^T)=\text{det}(\boldsymbol{A}^H) det(A)=det(A∗)=det(AT)=det(AH)。另外, det ( A ) = 1 det ( A − 1 ) \text{det}(\boldsymbol{A})=\frac{1}{\text{det}(\boldsymbol{A}^{-1})} det(A)=det(A−1)1。因此我们可以把拉格朗日函数 g g g重写为
g
:
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
(
−
∥
(
C
k
t
+
1
)
−
1
2
(
h
n
−
μ
k
t
+
1
)
∥
2
2
+
log
α
k
t
+
1
−
log
∣
C
k
t
+
1
∣
)
−
λ
(
∑
k
=
1
K
α
k
−
1
)
=
∑
n
=
1
N
p
∑
k
=
1
K
γ
n
k
t
+
1
(
−
(
h
n
−
μ
k
t
+
1
)
H
(
C
k
t
+
1
)
−
1
(
h
n
−
μ
k
t
+
1
)
+
log
α
k
t
+
1
+
log
∣
(
(
C
k
t
+
1
)
−
1
)
∗
∣
)
−
λ
(
∑
k
=
1
K
α
k
−
1
)
根据公式,若
A
∈
C
N
×
N
\boldsymbol{A} \in \mathbb{C}^{N \times N}
A∈CN×N
∂
y
H
A
H
x
∂
A
∗
=
x
y
H
∂
det
(
A
)
∂
A
=
det
(
A
)
⋅
(
A
T
)
−
1
对
(
C
k
−
1
)
∗
(\boldsymbol C^{-1}_k)^*
(Ck−1)∗进行求导:
∂
g
∂
(
C
k
−
1
)
∗
=
∑
n
=
1
N
p
γ
n
k
t
+
1
(
−
(
h
n
−
μ
k
t
+
1
)
(
h
n
−
μ
k
t
+
1
)
H
+
C
k
t
+
1
)
=
C
k
t
+
1
∑
n
=
1
N
p
γ
n
k
t
+
1
−
∑
n
=
1
N
p
γ
n
k
t
+
1
(
h
n
−
μ
k
t
+
1
)
(
h
n
−
μ
k
t
+
1
)
H
=
0
⇒
C
k
t
+
1
=
∑
n
=
1
N
p
γ
n
k
t
+
1
(
h
n
−
μ
k
t
+
1
)
(
h
n
−
μ
k
t
+
1
)
H
∑
n
=
1
N
p
γ
n
k
t
+
1
这与我们的直觉是一致的。
α
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
N
p
μ
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
h
n
∑
n
=
1
N
p
γ
n
k
t
+
1
σ
k
2
=
1
2
N
U
E
N
B
S
⋅
∑
n
=
1
N
p
γ
n
k
t
+
1
∥
(
h
n
−
μ
k
t
+
1
)
∥
2
2
∑
n
=
1
N
p
γ
n
k
t
+
1
α
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
N
p
μ
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
h
n
∑
n
=
1
N
p
γ
n
k
t
+
1
C
k
=
∑
n
=
1
N
p
γ
n
k
t
+
1
(
h
n
−
μ
k
t
+
1
)
(
h
n
−
μ
k
t
+
1
)
H
∑
n
=
1
N
p
γ
n
k
t
+
1