• 数据挖掘与分析课程笔记(Chapter 20)


    数据挖掘与分析课程笔记

    • 参考教材:Data Mining and Analysis : MOHAMMED J.ZAKI, WAGNER MEIRA JR.

    文章目录

    1. 数据挖掘与分析课程笔记(目录)
    2. 数据挖掘与分析课程笔记(Chapter 1)
    3. 数据挖掘与分析课程笔记(Chapter 2)
    4. 数据挖掘与分析课程笔记(Chapter 5)
    5. 数据挖掘与分析课程笔记(Chapter 7)
    6. 数据挖掘与分析课程笔记(Chapter 14)
    7. 数据挖掘与分析课程笔记(Chapter 15)
    8. 数据挖掘与分析课程笔记(Chapter 20)
    9. 数据挖掘与分析课程笔记(Chapter 21)


    Chapter 20: Linear Discriminant Analysis

    Set up D = { ( x i , y i ) } i = 1 n \mathbf{D}=\{(\mathbf{x}_i,y_i) \}_{i=1}^n D={(xi,yi)}i=1n, 其中 y i = 1 , 2 y_i=1,2 yi=1,2(或 ± 1 \pm 1 ±1 等), D 1 = { x i ∣ y i = 1 } \mathbf{D}_1=\{\mathbf{x}_i|y_i=1 \} D1={xiyi=1} D 2 = { x i ∣ y i = 2 } \mathbf{D}_2=\{\mathbf{x}_i|y_i=2 \} D2={xiyi=2}

    Goal:寻找向量 w ∈ R d \mathbf{w}\in \mathbb{R}^d wRd (代表直线方向)使得 D 1 , D 2 \mathbf{D}_1,\mathbf{D}_2 D1,D2 的“平均值”距离最大且“总方差”最小。

    20.1 Normal LDA

    w ∈ R d , w T w = 1 \mathbf{w} \in \mathbb{R}^d,\mathbf{w}^T\mathbf{w}=1 wRd,wTw=1,则 x i \mathbf{x}_i xi w \mathbf{w} w 方向上的投影为 x i ′ = ( w T x i w T u ) w = a i w , a i = w T x i \mathbf{x}_{i}^{\prime}=\left(\frac{\mathbf{w}^{T} \mathbf{x}_{i}}{\mathbf{w}^{T} \mathbf{u}}\right) \mathbf{w}=a_{i} \mathbf{w},a_{i}=\mathbf{w}^{T} \mathbf{x}_{i} xi=(wTuwTxi)w=aiw,ai=wTxi

    D 1 \mathbf{D}_1 D1 中数据在 w \mathbf{w} w 上的投影平均值为:( ∣ D 1 ∣ = n 1 |\mathbf{D}_1|=n_1 D1=n1
    m 1 : = 1 n 1 ∑ x i ∈ D 1 a i = μ 1 T w m_1:=\frac{1}{n_1}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_1}a_i=\boldsymbol{\mu}_1^T\mathbf{w} m1:=n11xiD1ai=μ1Tw
    投影平均值等于平均值的投影。

    类似地: D 2 \mathbf{D}_2 D2 中数据在 w \mathbf{w} w 上的投影平均值为:
    m 2 : = 1 n 2 ∑ x i ∈ D 2 a i = μ 2 T w m_2:=\frac{1}{n_2}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_2}a_i=\boldsymbol{\mu}_2^T\mathbf{w} m2:=n21xiD2ai=μ2Tw
    目标之一:寻找 w \mathbf{w} w 使得 ( m 1 − m 2 ) 2 (m_1-m_2)^2 (m1m2)2 最大。

    对于 D i \mathbf{D}_i Di,定义:
    s i 2 = ∑ x k ∈ D i ( a k − m i ) 2 s_i^2=\sum\limits_{\mathbf{x}_k\in \mathbf{D}_i}(a_k-m_i)^2 si2=xkDi(akmi)2
    注意: s i 2 = n i σ i 2   ( ∣ D i ∣ = n i ) s_i^2=n_i\sigma^2_i\ (|D_i|=n_i) si2=niσi2 (Di=ni)

    Goal:Fisher LDA目标函数:
    max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2} wmaxJ(w)=s12+s22(m1m2)2
    注意: J ( w ) = J ( w 1 , w 2 , ⋯   , w d ) J(\mathbf{w})=J(w_1,w_2,\cdots,w_d) J(w)=J(w1,w2,,wd)
    ( m 1 − m 2 ) 2 = ( w T ( μ 1 − μ 2 ) ) 2 = w T ( ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T ) w = w T B w

    (m1m2)2=(wT(μ1μ2))2=wT((μ1μ2)(μ1μ2)T)w=wTBw" role="presentation">(m1m2)2=(wT(μ1μ2))2=wT((μ1μ2)(μ1μ2)T)w=wTBw
    (m1m2)2=(wT(μ1μ2))2=wT((μ1μ2)(μ1μ2)T)w=wTBw

    B \mathbf{B} B 被称为类间扩散矩阵

    s 1 2 = ∑ x i ∈ D 1 ( a i − m 1 ) 2 = ∑ x i ∈ D 1 ( w T x i − w T μ 1 ) 2 = ∑ x i ∈ D 1 ( w T ( x i − μ 1 ) ) 2 = w T ( ∑ x i ∈ D 1 ( x i − μ 1 ) ( x i − μ 1 ) T ) w = w T S 1 w

    s12=xiD1(aim1)2=xiD1(wTxiwTμ1)2=xiD1(wT(xiμ1))2=wT(xiD1(xiμ1)(xiμ1)T)w=wTS1w" role="presentation">s12=xiD1(aim1)2=xiD1(wTxiwTμ1)2=xiD1(wT(xiμ1))2=wT(xiD1(xiμ1)(xiμ1)T)w=wTS1w
    s12=xiD1(aim1)2=xiD1(wTxiwTμ1)2=xiD1(wT(xiμ1))2=wT(xiD1(xiμ1)(xiμ1)T)w=wTS1w

    S 1 \mathbf{S}_{1} S1 被称为 D 1 \mathbf{D}_1 D1 的扩散矩阵 S 1 = n 1 Σ 1 \mathbf{S}_{1}=n_1\Sigma_1 S1=n1Σ1

    类似地, s 2 2 = w T S 2 w s_{2}^{2}=\mathbf{w}^{T} \mathbf{S}_{2} \mathbf{w} s22=wTS2w

    S = S 1 + S 2 \mathbf{S}=\mathbf{S}_{1}+\mathbf{S}_{2} S=S1+S2,则
    max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 = w T B w w T S w \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S} \mathbf{w}} wmaxJ(w)=s12+s22(m1m2)2=wTSwwTBw

    注意:
    d d w J ( w ) = 2 B w ( w T S w ) − 2 S w ( w T B w ) ( w T S w ) 2 = 0 \frac{d}{d\mathbf{w}}J(\mathbf{w})=\frac{2\mathbf{B}\mathbf{w}(\mathbf{w}^T\mathbf{S}\mathbf{w})-2\mathbf{S}\mathbf{w}(\mathbf{w}^T\mathbf{B}\mathbf{w})}{(\mathbf{w}^T\mathbf{S}\mathbf{w})^2}=\mathbf{0} dwdJ(w)=(wTSw)22Bw(wTSw)2Sw(wTBw)=0
    即有:
    B w ( w T S w ) = S w ( w T B w ) B w = S w ⋅ w T B w w T S w B w = J ( w ) ⋅ S w ( ∗ )

    Bw(wTSw)=Sw(wTBw)Bw=SwwTBwwTSwBw=J(w)Sw()" role="presentation">Bw(wTSw)=Sw(wTBw)Bw=SwwTBwwTSwBw=J(w)Sw()
    Bw(wTSw)BwBw=Sw(wTBw)=SwwTSwwTBw=J(w)Sw()
    S − 1 \mathbf{S}^{-1} S1 存在,则
    S − 1 B w = J ( w ) ⋅ w \mathbf{S}^{-1}\mathbf{B}\mathbf{w}=J(\mathbf{w})\cdot\mathbf{w} S1Bw=J(w)w
    故要求最大 J ( w ) J(\mathbf{w}) J(w) ,只需 S − 1 B \mathbf{S}^{-1}\mathbf{B} S1B 的最大特征值, w \mathbf{w} w 为其特征向量。

    ☆ 不求特征向量求出 w \mathbf{w} w 的方法

    B = ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T \mathbf{B}=(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})^{T} B=(μ1μ2)(μ1μ2)T 代入 ( ∗ ) (*) ()
    ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T w = J ( w ) ⋅ S w S − 1 ( μ 1 − μ 2 ) [ ( μ 1 − μ 2 ) T w J ( w ) ] = w

    (μ1μ2)(μ1μ2)Tw=J(w)SwS1(μ1μ2)[(μ1μ2)TwJ(w)]=w" role="presentation">(μ1μ2)(μ1μ2)Tw=J(w)SwS1(μ1μ2)[(μ1μ2)TwJ(w)]=w
    (μ1μ2)(μ1μ2)TwS1(μ1μ2)[J(w)(μ1μ2)Tw]=J(w)Sw=w
    故只需计算 S − 1 ( μ 1 − μ 2 ) \mathbf{S}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2}) S1(μ1μ2),再单位化。

    20.2 Kernel LDA:

    事实1:如果 ( S ϕ − 1 B ϕ ) w = λ w \left(\mathbf{S}_{\phi}^{-1} \mathbf{B}_{\phi}\right) \mathbf{w}=\lambda \mathbf{w} (Sϕ1Bϕ)w=λw,那么 w = ∑ j = 1 n a j ϕ ( x j ) \mathbf{w}=\sum\limits_{j=1}^na_j\phi(\mathbf{x}_j) w=j=1najϕ(xj),证明见讲稿最后两页。

    a = ( a 1 , ⋯   , a n ) T \mathbf{a}=(a_1,\cdots,a_n)^T a=(a1,,an)T 是“事实1”中的向量。

    下面将 max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 = w T B ϕ w w T S ϕ w \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B}_{\phi} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S}_{\phi} \mathbf{w}} wmaxJ(w)=s12+s22(m1m2)2=wTSϕwwTBϕw 的问题转化为 max ⁡ G ( a ) \max G(\mathbf{a}) maxG(a) s.t. 使用 K \mathbf{K} K 能求解。

    注意到:
    m i = w T μ i ϕ = ( ∑ j = 1 n a j ϕ ( x j ) ) T ( 1 n i ∑ x i ∈ D i ϕ ( x k ) ) = 1 n i ∑ j = 1 n ∑ x k ∈ D i a j ϕ ( x j ) T ϕ ( x k ) = 1 n i ∑ j = 1 n ∑ x k ∈ D i a j K ( x j , x k ) = a T m i

    mi=wTμiϕ=(j=1najϕ(xj))T(1nixiDiϕ(xk))=1nij=1nxkDiajϕ(xj)Tϕ(xk)=1nij=1nxkDiajK(xj,xk)=aTmi" role="presentation">mi=wTμiϕ=(j=1najϕ(xj))T(1nixiDiϕ(xk))=1nij=1nxkDiajϕ(xj)Tϕ(xk)=1nij=1nxkDiajK(xj,xk)=aTmi
    mi=wTμiϕ=(j=1najϕ(xj))T(ni1xiDiϕ(xk))=ni1j=1nxkDiajϕ(xj)Tϕ(xk)=ni1j=1nxkDiajK(xj,xk)=aTmi
    其中,
    m i = 1 n i ( ∑ x k ∈ D i K ( x 1 , x k ) ∑ x k ∈ D i K ( x 2 , x k ) ⋮ ∑ x k ∈ D i K ( x n , x k ) ) n × 1 \mathbf{m}_{i}=\frac{1}{n_{i}}\left(
    xkDiK(x1,xk)xkDiK(x2,xk)xkDiK(xn,xk)" role="presentation">xkDiK(x1,xk)xkDiK(x2,xk)xkDiK(xn,xk)
    \right)_{n\times 1}
    mi=ni1 xkDiK(x1,xk)xkDiK(x2,xk)xkDiK(xn,xk) n×1


    ( m 1 − m 2 ) 2 = ( w T μ 1 ϕ − w T μ 2 ϕ ) 2 = ( a T m 1 − a T m 2 ) 2 = a T ( m 1 − m 2 ) ( m 1 − m 2 ) T a = a T M a
    (m1m2)2=(wTμ1ϕwTμ2ϕ)2=(aTm1aTm2)2=aT(m1m2)(m1m2)Ta=aTMa" role="presentation">(m1m2)2=(wTμ1ϕwTμ2ϕ)2=(aTm1aTm2)2=aT(m1m2)(m1m2)Ta=aTMa
    (m1m2)2=(wTμ1ϕwTμ2ϕ)2=(aTm1aTm2)2=aT(m1m2)(m1m2)Ta=aTMa

    M \mathbf{M} M 被称为核类间扩散矩阵)
    s 1 2 = ∑ x i ∈ D 1 ∥ w T ϕ ( x i ) − w T μ 1 ϕ ∥ 2 = ∑ x i ∈ D 1 ∥ w T ϕ ( x i ) ∥ 2 − 2 ∑ x i ∈ D 1 w T ϕ ( x i ) ⋅ w T μ 1 ϕ + ∑ x i ∈ D 1 ∥ w T μ 1 ϕ ∥ 2 = ( ∑ x i ∈ D 1 ∥ ∑ j = 1 n a j ϕ ( x j ) T ϕ ( x i ) ∥ 2 ) − 2 ⋅ n 1 ⋅ ∥ w T μ 1 ϕ ∥ 2 + n 1 ⋅ ∥ w T μ 1 ϕ ∥ 2 = ( ∑ x i ∈ D 1 a T K i K i T a ) − n 1 ⋅ a T m 1 m 1 T a = a T ( ( ∑ x i ∈ D 1 K i K i T ) − n 1 m 1 m 1 T ) a = a T N 1 a
    s12=xiD1wTϕ(xi)wTμ1ϕ2=xiD1wTϕ(xi)22xiD1wTϕ(xi)wTμ1ϕ+xiD1wTμ1ϕ2=(xiD1j=1najϕ(xj)Tϕ(xi)2)2n1wTμ1ϕ2+n1wTμ1ϕ2=(xiD1aTKiKiTa)n1aTm1m1Ta=aT((xiD1KiKiT)n1m1m1T)a=aTN1a" role="presentation">s12=xiD1wTϕ(xi)wTμ1ϕ2=xiD1wTϕ(xi)22xiD1wTϕ(xi)wTμ1ϕ+xiD1wTμ1ϕ2=(xiD1j=1najϕ(xj)Tϕ(xi)2)2n1wTμ1ϕ2+n1wTμ1ϕ2=(xiD1aTKiKiTa)n1aTm1m1Ta=aT((xiD1KiKiT)n1m1m1T)a=aTN1a
    s12=xiD1 wTϕ(xi)wTμ1ϕ 2=xiD1 wTϕ(xi) 22xiD1wTϕ(xi)wTμ1ϕ+xiD1 wTμ1ϕ 2= xiD1 j=1najϕ(xj)Tϕ(xi) 2 2n1 wTμ1ϕ 2+n1 wTμ1ϕ 2=(xiD1aTKiKiTa)n1aTm1m1Ta=aT((xiD1KiKiT)n1m1m1T)a=aTN1a

    类似地,令 N 2 = ( ∑ x i ∈ D 2 K i K i T − n 2 m 2 m 2 T ) \mathbf{N}_2=\left(\sum\limits_{\mathbf{x}_{i} \in \mathbf{D}_{2}} \mathbf{K}_{i} \mathbf{K}_{i}^{T}-n_{2} \mathbf{m}_{2} \mathbf{m}_{2}^{T}\right) N2=(xiD2KiKiTn2m2m2T)

    s 1 2 + s 2 2 = a T ( N 1 + N 2 ) a = a T N a s_1^2+s_2^2=\mathbf{a}^{T} (\mathbf{N}_{1}+\mathbf{N}_{2}) \mathbf{a}=\mathbf{a}^{T}\mathbf{N} \mathbf{a} s12+s22=aT(N1+N2)a=aTNa

    故: J ( w ) = a T M a a T N a : = G ( a ) J(\mathbf{w})=\frac{\mathbf{a}^{T}\mathbf{M} \mathbf{a}}{\mathbf{a}^{T}\mathbf{N} \mathbf{a}}:=G(\mathbf{a}) J(w)=aTNaaTMa:=G(a)

    类似 20.1, M a = λ N a \mathbf{M} \mathbf{a}=\lambda\mathbf{N} \mathbf{a} Ma=λNa

    • N − 1 \mathbf{N} ^{-1} N1 存在, N − 1 M a = λ a \mathbf{N}^{-1} \mathbf{M} \mathbf{a}=\lambda \mathbf{a} N1Ma=λa λ \lambda λ N − 1 M \mathbf{N}^{-1} \mathbf{M} N1M 的最大特征值, a \mathbf{a} a 是相应的特征向量。

    • N − 1 \mathbf{N} ^{-1} N1 不存在,MATLAB 求广义逆

    最后考查 w T w = 1 \mathbf{w}^T\mathbf{w}=1 wTw=1,即

    ( ∑ j = 1 n a j ϕ ( x j ) ) T ( ∑ i = 1 n a i ϕ ( x i ) ) = 1 ∑ j = 1 n ∑ i = 1 n a j a i ϕ ( x j ) T ϕ ( x i ) = 1 ∑ j = 1 n ∑ i = 1 n a j a i K ( x i , x j ) = 1 a T K a = 1

    (j=1najϕ(xj))T(i=1naiϕ(xi))=1j=1ni=1najaiϕ(xj)Tϕ(xi)=1j=1ni=1najaiK(xi,xj)=1aTKa=1" role="presentation">(j=1najϕ(xj))T(i=1naiϕ(xi))=1j=1ni=1najaiϕ(xj)Tϕ(xi)=1j=1ni=1najaiK(xi,xj)=1aTKa=1
    (j=1najϕ(xj))T(i=1naiϕ(xi))j=1ni=1najaiϕ(xj)Tϕ(xi)j=1ni=1najaiK(xi,xj)aTKa=1=1=1=1
    求出 N − 1 M \mathbf{N}^{-1} \mathbf{M} N1M 的特征向量 a \mathbf{a} a 后, a ← a a T K a \mathbf{a}\leftarrow \frac{\mathbf{a}}{\sqrt{\mathbf{a}^T\mathbf{K}\mathbf{a}}} aaTKa a 以保证 w T w = 1 \mathbf{w}^T\mathbf{w}=1 wTw=1


  • 相关阅读:
    NR 物理层编码 S1 -概述
    VMware安装虚拟机ubuntu
    Mysql-CRUD(增删查改)
    观察者模式 行为型设计模式之七
    高并发网站架构实战
    测试人需要的数据库知识:MySQL常用语法
    springboot企业信誉制度管理系统vue+elementui
    Nodejs -- 前后端身份认证概念及在Express中使用认证(Session,Cookie,JWT)
    Elasticsearch 带中文分词的全文检索(分页+高亮返回)
    java基于springboot+vue动物诊所综合管理系统
  • 原文地址:https://blog.csdn.net/yyywxk/article/details/127671900