上一节介绍了从权重空间角度认识高斯过程回归。本节将介绍从函数空间角度认识高斯过程回归。
从权重空间(Weight-Space)视角观察高斯过程回归和高斯过程(Gaussian Process)本身没有直接联系。其本质上是 针对非线性回归任务,使用贝叶斯线性回归与核技巧(Kernal Trick)相结合的方式进行求解:
针对非线性回归任务,使用非线性转换(Non-Linear Transformation)
ϕ
(
⋅
)
\phi(\cdot)
ϕ(⋅)将原始特征空间
X
∈
R
p
\mathcal X \in \mathbb R^p
X∈Rp映射到高维空间:
X
∈
R
p
→
ϕ
(
X
)
∈
R
q
q
≫
p
X∈Rp→ϕ(X)∈Rqq≫p
由于样本特征空间的变化,因而影响随机变量
W
\mathcal W
W的后验概率分布
P
(
W
∣
D
a
t
a
)
\mathcal P(\mathcal W \mid Data)
P(W∣Data):
P
(
W
∣
D
a
t
a
)
∼
N
(
μ
W
,
Σ
W
)
→
{
μ
W
=
A
−
1
[
ϕ
(
X
)
]
T
Y
σ
2
Σ
W
=
A
−
1
A
=
[
ϕ
(
X
)
]
T
ϕ
(
X
)
σ
2
+
[
Σ
p
r
i
o
r
−
1
]
q
×
q
\mathcal P(\mathcal W \mid Data) \sim \mathcal N(\mu_{\mathcal W},\Sigma_{\mathcal W}) \to {μW=A−1[ϕ(X)]TYσ2ΣW=A−1A=[ϕ(X)]Tϕ(X)σ2+[Σ−1prior]q×q
从而对经过非线性转换后的给定(未知)样本 ϕ ( x ^ ) \phi(\hat x) ϕ(x^)的标签 f [ ϕ ( x ^ ) ] f[\phi(\hat x)] f[ϕ(x^)]进行预测(Prediction):
推导过程复杂的部分是
A
−
1
\mathcal A^{-1}
A−1的求解,关于
A
−
1
\mathcal A^{-1}
A−1的求解过程详见上一节.这里预测的是'不含高斯噪声'的
f
[
ϕ
(
x
^
)
]
f[\phi(\hat x)]
f[ϕ(x^)]而不是
y
^
\hat y
y^,如果要预测
y
^
\hat y
y^需要在协方差中加上
σ
2
\sigma^2
σ2.最终展开结果表示如下:
其中
[
Σ
p
r
i
o
r
]
q
×
q
[\Sigma_{prior}]_{q \times q}
[Σprior]q×q表示先验分布的协方差矩阵;
I
q
×
q
\mathcal I_{q \times q}
Iq×q表示单位矩阵。
K
(
X
,
X
)
q
×
q
\mathcal K(\mathcal X,\mathcal X)_{q \times q}
K(X,X)q×q表示
[
ϕ
(
X
)
]
T
Σ
p
r
i
o
r
ϕ
(
X
)
[\phi(\mathcal X)]^T\Sigma_{prior}\phi(\mathcal X)
[ϕ(X)]TΣpriorϕ(X).
P
[
f
(
x
^
)
∣
D
a
t
a
,
x
^
]
∼
N
(
μ
x
^
.
Σ
x
^
)
{
μ
x
^
=
[
ϕ
(
x
^
)
]
T
Σ
p
r
i
o
r
[
ϕ
(
X
)
]
T
[
K
(
X
,
X
)
+
σ
2
I
]
−
1
Σ
x
^
=
[
ϕ
(
x
^
)
]
T
⋅
{
Σ
p
r
i
o
r
−
Σ
p
r
i
o
r
[
ϕ
(
X
)
]
T
[
K
(
X
,
X
)
+
σ
2
I
]
−
1
ϕ
(
X
)
Σ
p
r
i
o
r
}
⋅
ϕ
(
x
^
)
\mathcal P[f(\hat x) \mid Data,\hat x] \sim \mathcal N(\mu_{\hat x}.\Sigma_{\hat x}) \\ {μˆx=[ϕ(ˆx)]TΣprior[ϕ(X)]T[K(X,X)+σ2I]−1Σˆx=[ϕ(ˆx)]T⋅{Σprior−Σprior[ϕ(X)]T[K(X,X)+σ2I]−1ϕ(X)Σprior}⋅ϕ(ˆx)
针对公式中出现的复杂的内积问题,使用核技巧(Kernal Trick)进行处理。假设存在关于变量
x
,
x
′
x,x'
x,x′的核函数
K
(
x
,
x
′
)
\mathcal K(x,x')
K(x,x′)表示如下:
这里
[
Σ
p
r
i
o
r
]
q
×
q
[\Sigma_{prior}]_{q \times q}
[Σprior]q×q至少是半正定矩阵。
K
(
x
,
x
′
)
=
[
ϕ
(
x
)
]
T
Σ
p
r
i
o
r
ϕ
(
x
′
)
=
[
Σ
p
r
i
o
r
ϕ
(
x
)
]
T
[
Σ
p
r
i
o
r
ϕ
(
x
′
)
]
=
⟨
Σ
p
r
i
o
r
ϕ
(
x
)
,
Σ
p
r
i
o
r
ϕ
(
x
′
)
⟩
K(x,x′)=[ϕ(x)]TΣpriorϕ(x′)=[√Σprior ϕ(x)]T[√Σprior ϕ(x′)]=⟨√Σprior ϕ(x),√Σprior ϕ(x′)⟩
与核函数的处理方式相同,直接规避了非线性函数
ϕ
(
⋅
)
\phi(\cdot)
ϕ(⋅)的高维复杂运算。直接对其内积进行求解。
高斯过程(Gaussian Process)本质上式一组高维随机变量组成的集合:
{
ξ
t
}
t
∈
T
=
{
⋯
,
ξ
t
1
,
ξ
t
2
,
⋯
,
ξ
t
n
,
⋯
}
(
t
1
,
t
2
⋯
,
t
n
∈
T
)
\{\xi_{t}\}_{t \in \mathcal T} = \{\cdots,\xi_{t_1},\xi_{t_2},\cdots,\xi_{t_n},\cdots\} \quad (t_1,t_2\cdots,t_n \in \mathcal T)
{ξt}t∈T={⋯,ξt1,ξt2,⋯,ξtn,⋯}(t1,t2⋯,tn∈T)
其中
T
\mathcal T
T表示连续域,它可能是时间/空间中的连续域。对于高斯过程的定义可描述为:对于任意
{
t
1
,
t
2
,
⋯
,
t
n
}
∈
T
\{t_1,t_2,\cdots,t_n\} \in \mathcal T
{t1,t2,⋯,tn}∈T对应随机过程
{
ξ
t
}
t
∈
T
\{\xi_t\}_{t \in \mathcal T}
{ξt}t∈T的子集:
ξ
t
1
→
t
n
=
{
ξ
t
1
,
ξ
t
2
,
⋯
,
ξ
t
n
}
\xi_{t_1 \to t_n} = \{\xi_{t_1},\xi_{t_2},\cdots,\xi_{t_n}\}
ξt1→tn={ξt1,ξt2,⋯,ξtn}服从某一高斯分布
N
(
μ
t
1
→
t
n
,
Σ
t
1
→
t
n
)
\mathcal N(\mu_{t_1 \to t_n},\Sigma_{t_1 \to t_n})
N(μt1→tn,Σt1→tn),那么称
{
ξ
t
}
t
∈
T
\{\xi_{t}\}_{t \in \mathcal T}
{ξt}t∈T是高斯过程:
由于
t
∈
T
t \in \mathcal T
t∈T是稠密的(可以理解为‘时间间隔无限趋近于0,依然存在随机变量’),从而可以看作是连续域
T
\mathcal T
T内的‘无限维’高斯分布。
{
ξ
t
}
t
∈
T
∼
G
P
[
m
(
t
)
,
K
(
t
,
s
)
]
(
s
,
t
∈
T
)
\{\xi_t\}_{t \in \mathcal T} \sim \mathcal G\mathcal P[m(t),\mathcal K(t,s)] \quad (s,t \in \mathcal T)
{ξt}t∈T∼GP[m(t),K(t,s)](s,t∈T)
需要注意的是,均值函数(Mean-Function)
m
(
t
)
m(t)
m(t)和 方差函数(Covariance Function)
K
(
s
,
t
)
\mathcal K(s,t)
K(s,t)它们均是基于函数形式的表达,这说明:不同时刻/状态下的均值/协方差结果不是固定值,而是表示为关于
s
,
t
s,t
s,t的函数。
X
∈
R
p
→
X
∼
N
(
μ
p
,
Σ
p
×
p
)
\mathcal X \in \mathbb R^p \to \mathcal X \sim \mathcal N(\mu_p,\Sigma_{p \times p})
X∈Rp→X∼N(μp,Σp×p)
相反,如高斯网络(Gaussian Network),一旦随机变量集合 X \mathcal X X确定了,那么对应的概率图模型就是静态模型,对应的期望结果 μ p \mu_p μp和协方差矩阵 Σ p × p \Sigma_{p \times p} Σp×p就是恒定不变的,从概率图的角度观察各随机变量结点之间的关联关系也是确定的。
基于线性回归模型(无高斯噪声)
f
(
X
)
=
X
T
W
f(\mathcal X) = \mathcal X^T\mathcal W
f(X)=XTW,对特征空间
X
∈
R
p
\mathcal X \in \mathbb R^p
X∈Rp进行非线性高维转换:
X
→
ϕ
(
X
)
∈
R
q
\mathcal X \to \phi(\mathcal X) \in \mathbb R^q
X→ϕ(X)∈Rq;
给定模型参数
W
\mathcal W
W一个先验分布:
由于
X
\mathcal X
X已经执行了‘非线性转换’,因此此时的
W
\mathcal W
W是
q
q
q维随机变量,对应的协方差矩阵
Σ
p
r
i
o
r
\Sigma_{prior}
Σprior同样需要时
q
×
q
q \times q
q×q的格式。
W
∼
N
(
0
,
[
Σ
p
r
i
o
r
]
q
×
q
)
\mathcal W \sim \mathcal N(0,[\Sigma_{prior}]_{q \times q})
W∼N(0,[Σprior]q×q)
因此,线性模型
f
(
X
)
f(\mathcal X)
f(X)的期望
E
[
f
(
X
)
]
\mathbb E[f(\mathcal X)]
E[f(X)]可表示如下:
这里关注的是
W
\mathcal W
W的变化,因此这里将
ϕ
(
X
)
\phi(\mathcal X)
ϕ(X)看作常数。
E
[
f
(
X
)
]
=
E
{
[
ϕ
(
X
)
]
T
W
}
=
[
ϕ
(
X
)
]
T
E
[
W
]
=
[
ϕ
(
X
)
]
T
⋅
0
=
0
\mathbb E[f(\mathcal X)] = \mathbb E\left\{[\phi(\mathcal X)]^T \mathcal W\right\} = [\phi(\mathcal X)]^T \mathbb E[\mathcal W] = [\phi(\mathcal X)]^T \cdot 0 = 0
E[f(X)]=E{[ϕ(X)]TW}=[ϕ(X)]TE[W]=[ϕ(X)]T⋅0=0
对于任意
x
(
i
)
,
x
(
j
)
∈
R
p
x^{(i)},x^{(j)} \in \mathbb R^p
x(i),x(j)∈Rp,对应函数结果的协方差
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
Cov \left[f(x^{(i)}),f(x^{(j)})\right]
Cov[f(x(i)),f(x(j))]表示如下:
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
=
E
{
[
f
(
x
(
i
)
)
−
E
[
f
(
x
(
i
)
)
]
]
⋅
[
f
(
x
(
j
)
)
−
E
[
f
(
x
(
j
)
)
]
]
}
=
E
{
[
f
(
x
(
i
)
)
−
0
]
⋅
[
f
(
x
(
j
)
)
−
0
]
}
=
E
[
f
(
x
(
i
)
)
⋅
f
(
x
(
j
)
)
]
=
E
[
ϕ
(
x
(
i
)
)
T
W
⋅
ϕ
(
x
(
j
)
)
T
W
]
Cov[f(x(i)),f(x(j))]=E{[f(x(i))−E[f(x(i))]]⋅[f(x(j))−E[f(x(j))]]}=E{[f(x(i))−0]⋅[f(x(j))−0]}=E[f(x(i))⋅f(x(j))]=E[ϕ(x(i))TW⋅ϕ(x(j))TW]
由于
ϕ
(
x
(
j
)
)
T
W
\phi(x^{(j)})^T \mathcal W
ϕ(x(j))TW结果是一个实数,因而
[
ϕ
(
x
(
j
)
)
T
W
]
T
=
W
T
ϕ
(
x
(
j
)
)
\left[\phi(x^{(j)})^T \mathcal W\right]^T = \mathcal W^T\phi(x^{(j)})
[ϕ(x(j))TW]T=WTϕ(x(j))等于
ϕ
(
x
(
j
)
)
T
W
\phi(x^{(j)})^T \mathcal W
ϕ(x(j))TW自身。因而有:
Δ
\Delta
Δ表示上述推导结果。
Δ
=
E
[
ϕ
(
x
(
i
)
)
T
W
⋅
W
T
ϕ
(
x
(
j
)
)
]
=
[
ϕ
(
x
(
i
)
)
]
T
⋅
E
[
W
⋅
W
T
]
⋅
ϕ
(
x
(
j
)
)
Δ=E[ϕ(x(i))TW⋅WTϕ(x(j))]=[ϕ(x(i))]T⋅E[W⋅WT]⋅ϕ(x(j))
观察
E
[
W
⋅
W
T
]
\mathbb E[\mathcal W \cdot \mathcal W^T]
E[W⋅WT],它实际上就是:
E
[
W
⋅
W
T
]
=
E
[
(
W
−
0
)
⋅
(
W
T
−
0
)
]
=
E
{
[
W
−
E
[
W
]
]
⋅
[
W
−
E
[
W
]
]
T
}
=
C
o
v
(
W
,
W
)
=
Σ
p
r
i
o
r
E[W⋅WT]=E[(W−0)⋅(WT−0)]=E{[W−E[W]]⋅[W−E[W]]T}=Cov(W,W)=Σprior
至此,关于
f
(
x
(
i
)
)
f(x^{(i)})
f(x(i))和
f
(
x
(
j
)
)
f(x^{(j)})
f(x(j))的协方差结果
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
Cov \left[f(x^{(i)}),f(x^{(j)})\right]
Cov[f(x(i)),f(x(j))]表示如下:
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
=
[
ϕ
(
x
(
i
)
)
]
1
×
q
T
⋅
[
Σ
p
r
i
o
r
]
q
×
q
⋅
[
ϕ
(
x
(
j
)
)
]
q
×
1
=
K
(
x
(
i
)
,
x
(
j
)
)
Cov[f(x(i)),f(x(j))]=[ϕ(x(i))]T1×q⋅[Σprior]q×q⋅[ϕ(x(j))]q×1=K(x(i),x(j))
继续将
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
Cov\left[f(x^{(i)}),f(x^{(j)})\right]
Cov[f(x(i)),f(x(j))]展开,有:
在权重空间角度文章的末尾介绍的是‘记号函数’
K
(
⋅
,
⋅
)
\mathcal K(\cdot,\cdot)
K(⋅,⋅)的充分性证明。这里顺势补充一下必要性证明。
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
=
(
x
1
(
i
)
,
x
2
(
i
)
,
⋯
,
x
q
(
i
)
)
(
Σ
p
r
i
o
r
11
,
Σ
p
r
i
o
r
12
,
⋯
,
Σ
p
r
i
o
r
1
q
Σ
p
r
i
o
r
21
,
Σ
p
r
i
o
r
22
,
⋯
,
Σ
p
r
i
o
r
2
q
⋮
Σ
p
r
i
o
r
q
1
,
Σ
p
r
i
o
r
q
2
,
⋯
,
Σ
p
r
i
o
r
q
q
)
(
x
1
(
j
)
x
2
(
j
)
⋮
x
q
(
j
)
)
Σ
p
r
i
o
r
i
j
=
C
o
v
(
w
i
,
w
j
)
;
w
i
,
w
j
∈
W
=
[
∑
k
=
1
q
x
k
(
i
)
Σ
p
r
i
o
r
k
1
,
⋯
,
∑
k
=
1
q
x
k
(
i
)
Σ
p
r
i
o
r
k
q
]
(
x
1
(
j
)
x
2
(
j
)
⋮
x
q
(
j
)
)
=
∑
l
=
1
q
∑
k
=
1
q
x
k
(
i
)
⋅
Σ
p
r
i
o
r
k
l
⋅
x
l
(
j
)
Cov[f(x(i)),f(x(j))]=(x(i)1,x(i)2,⋯,x(i)q)(Σ11prior,Σ12prior,⋯,Σ1qpriorΣ21prior,Σ22prior,⋯,Σ2qprior⋮Σq1prior,Σq2prior,⋯,Σqqprior)(x(j)1x(j)2⋮x(j)q)Σijprior=Cov(wi,wj);wi,wj∈W=[q∑k=1x(i)kΣk1prior,⋯,q∑k=1x(i)kΣkqprior](x(j)1x(j)2⋮x(j)q)=q∑l=1q∑k=1x(i)k⋅Σklprior⋅x(j)l
其中,
x
k
(
i
)
,
Σ
p
r
i
o
r
k
l
,
x
l
(
j
)
x_k^{(i)},\Sigma_{prior}^{kl},x_l^{(j)}
xk(i),Σpriorkl,xl(j)均表示实数,因而有:
∑
l
=
1
q
∑
k
=
1
q
x
k
(
i
)
⋅
Σ
p
r
i
o
r
k
l
⋅
x
l
(
j
)
=
∑
l
=
1
q
∑
k
=
1
q
x
l
(
j
)
⋅
Σ
p
r
i
o
r
k
l
⋅
x
k
(
i
)
⇒
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
=
C
o
v
[
f
(
x
(
j
)
)
,
f
(
x
(
i
)
)
]
⇒
K
(
x
(
i
)
,
x
(
j
)
)
=
K
(
x
(
j
)
,
x
(
i
)
)
q∑l=1q∑k=1x(i)k⋅Σklprior⋅x(j)l=q∑l=1q∑k=1x(j)l⋅Σklprior⋅x(i)k⇒Cov[f(x(i)),f(x(j))]=Cov[f(x(j)),f(x(i))]⇒K(x(i),x(j))=K(x(j),x(i))
这意味着核矩阵
K
\mathbb K
K是实对称矩阵,那么它必然是半正定的:
K
=
[
K
(
x
(
1
)
,
x
(
1
)
)
,
K
(
x
(
1
)
,
x
(
2
)
)
,
⋯
,
K
(
x
(
1
)
,
x
(
N
)
)
K
(
x
(
2
)
,
x
(
1
)
)
,
K
(
x
(
2
)
,
x
(
2
)
)
,
⋯
,
K
(
x
(
2
)
,
x
(
N
)
)
⋮
K
(
x
(
N
)
,
x
(
1
)
)
,
K
(
x
(
N
)
,
x
(
2
)
)
,
⋯
,
K
(
x
(
N
)
,
x
(
N
)
)
]
N
×
N
\mathbb K = [K(x(1),x(1)),K(x(1),x(2)),⋯,K(x(1),x(N))K(x(2),x(1)),K(x(2),x(2)),⋯,K(x(2),x(N))⋮K(x(N),x(1)),K(x(N),x(2)),⋯,K(x(N),x(N))]
至此,证明记号
K
\mathcal K
K函数是正定核函数。
正定核函数必要性证明参考传送门
根据 C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] = K ( x ( i ) , x ( j ) ) Cov\left[f(x^{(i)}),f(x^{(j)})\right] = \mathcal K(x^{(i)},x^{(j)}) Cov[f(x(i)),f(x(j))]=K(x(i),x(j)),这意味着:如果将 { f ( X ) } x ∈ R p = { f ( x 1 ) , f ( x 2 ) , ⋯ , f ( x p ) } \{f(\mathcal X)\}_{x \in \mathbb R^p} = \{f(x_1),f(x_2),\cdots,f(x_p)\} {f(X)}x∈Rp={f(x1),f(x2),⋯,f(xp)}本身看做一个随机变量集合,那么这个随机变量本身的协方差结果可以由核函数表示。
回顾高斯过程的定义式:
{
ξ
t
}
t
∈
T
∼
G
P
[
m
(
t
)
,
K
(
t
,
s
)
]
(
s
,
t
∈
T
)
\{\xi_t\}_{t \in \mathcal T} \sim \mathcal G\mathcal P[m(t),\mathcal K(t,s)] \quad (s,t \in \mathcal T)
{ξt}t∈T∼GP[m(t),K(t,s)](s,t∈T),其中
s
,
t
s,t
s,t本身不是随机变量,它们仅是描述连续域中状态/时刻的下标(index),和随机变量
ξ
\xi
ξ之间不存在关系。因而可以将高斯过程定义式表示为如下形式:
{
{
f
(
X
)
}
X
∈
R
p
∼
G
P
[
m
(
X
)
,
K
(
x
(
i
)
,
x
(
j
)
)
]
x
(
i
)
,
x
(
j
)
∈
X
{
ξ
t
}
t
∈
T
∼
G
P
[
m
(
t
)
,
K
(
t
,
s
)
]
(
s
,
t
∈
T
)
{{f(X)}X∈Rp∼GP[m(X),K(x(i),x(j))]x(i),x(j)∈X{ξt}t∈T∼GP[m(t),K(t,s)](s,t∈T)
对比一下两种高斯过程的表达:
关于给定样本 x ^ \hat x x^的预测任务中:
函数空间角度与权重空间角度的核心差别在于
K
(
x
(
i
)
,
x
(
j
)
)
\mathcal K(x^{(i)},x^{(j)})
K(x(i),x(j))的表示上。
权重空间角度需要将
x
(
i
)
,
x
(
j
)
→
ϕ
(
x
(
i
)
)
,
ϕ
(
x
(
j
)
)
x^{(i)},x^{(j)} \to \phi(x^{(i)}),\phi(x^{(j)})
x(i),x(j)→ϕ(x(i)),ϕ(x(j)),然后通过高维转换后的样本维度重新对
W
\mathcal W
W的先验分布
P
(
W
)
\mathcal P(\mathcal W)
P(W)进行设定
→
N
(
0
,
Σ
p
r
i
o
r
)
\to \mathcal N(0,\Sigma_{prior})
→N(0,Σprior)。再凑成
K
(
x
(
i
)
,
x
(
j
)
)
=
ϕ
(
x
(
i
)
)
Σ
p
r
i
o
r
ϕ
(
x
(
j
)
)
\mathcal K(x^{(i)},x^{(j)}) = \phi(x^{(i)})\Sigma_{prior}\phi(x^{(j)})
K(x(i),x(j))=ϕ(x(i))Σpriorϕ(x(j))的格式,去求解
W
\mathcal W
W的后验概率分布
P
(
W
∣
D
a
t
a
)
\mathcal P(\mathcal W \mid Data)
P(W∣Data);函数空间角度直接用
C
o
v
[
f
(
x
(
i
)
)
,
f
(
x
(
j
)
)
]
Cov[f(x^{(i)}),f(x^{(j)})]
Cov[f(x(i)),f(x(j))]表示
K
(
x
(
i
)
,
x
(
j
)
)
\mathcal K(x^{(i)},x^{(j)})
K(x(i),x(j)),从而并不需要单独求解
W
\mathcal W
W,而是直接求解
f
(
x
(
i
)
)
=
[
ϕ
(
x
(
i
)
)
]
T
W
,
f
(
x
(
j
)
)
=
[
ϕ
(
x
(
j
)
)
]
T
W
f(x^{(i)}) = [\phi(x^{(i)})]^T\mathcal W,f(x^{(j)}) = [\phi(x^{(j)})]^T\mathcal W
f(x(i))=[ϕ(x(i))]TW,f(x(j))=[ϕ(x(j))]TW即可。在预测任务中,直接通过
[
ϕ
(
x
)
]
T
W
[\phi(x)]^T\mathcal W
[ϕ(x)]TW替代
W
\mathcal W
W执行预测任务。相关参考:
机器学习-高斯过程回归-权重空间到函数空间(From Weight-Space To Function-Space)