跳至主要內容

机械学习笔书|EM 算法

Kevin 吴嘉文大约 3 分钟知识笔记Machine Learning

Expectation Maximization

EM 算法笔记,源于 机器学习-白板推导系列(十)-EM 算法(Expectation Maximization)open in new window

对于混合模型,如 GMM。使用 MLE 直接求极大似然的解析解是十分困难的。EM 解决的就是具有隐变量的混合模型的参数估计。

算法

EM 算法公式:

E-step:根据后验 P(ZX,θ(t))P(Z|X,\theta^{(t)}) 写出 Ezx,θ(t)[logP(x,zθ)]E_{z \mid x, \theta^{(t)}}[\log P(x, z \mid \theta)]

M-step:令期望最大化 θ(t+1)=argmaxθzlogP(x,zθ)P(zx,θ(t))dz\theta^{(t+1)}=\arg \max _{\theta} \int_{z} \log P(x, z \mid \theta) \cdot P\left(z \mid x, \theta^{(t)}\right) d z

其中 zlogP(x,zθ)P(zx,θ(t))dz\int_{z} \log P(x, z \mid \theta) \cdot P\left(z \mid x, \theta^{(t)}\right) d z 可以表示为 Ezx,θ(t)[logP(x,zθ)]E_{z \mid x, \theta^{(t)}}[\log P(x, z \mid \theta)]

从公式看来,EM 可以分为 求出期望期望最大化两步

EM 是一个迭代算法, θt\theta ^ttt 时刻的参数,若做到 logP(xθ(t))logP(xθ(t+1))\log P\left(x \mid \theta^{(t)}\right) \leq \log P\left(x \mid \theta^{(t+1)}\right) ,才有可能求出最大期望。

公式合法性证明

logP(xθ)=logP(x,zθ)P(zx,θ)=logP(x,zθ)logP(zx,θ) \log P(x \mid \theta)=\log \frac{P(x, z \mid \theta)}{P(z \mid x, \theta)}=\log P(x, z \mid \theta)-\log P(z \mid x, \theta)

对上式两边关于 P(zx,θ(t))P\left(z \mid x, \theta^{(t)}\right) 求期望:

 左边 =zP(zx,θ(t))logP(xθ)dz=logP(xθ)zP(zx,θ(t))dz=logP(xθ) \begin{aligned} \text { 左边 } &=\int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P(x \mid \theta) d z \\ &=\log P(x \mid \theta) \int_{z} P\left(z \mid x, \theta^{(t)}\right) d z \\ &=\log P(x \mid \theta) \end{aligned}

 右边 =zP(zx,θ(t))logP(x,zθ)dzQ(θ,θ(t))zP(zx,θ(t))logP(zx,θ)dzH(θ,θ(t)) \text { 右边 }=\underbrace{\int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P(x, z \mid \theta) d z}_{Q\left(\theta, \theta^{(t)}\right)}-\underbrace{\int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P(z \mid x, \theta) d z}_{H\left(\theta, \theta^{(t)}\right)}

对于 Q(θ,θ(t))Q(\theta,\theta^{(t)})

Q(θ(t),θ(t))=zP(zx,θ(t))logP(x,zθ(t))dzQ(θ(t+1),θ(t))=zP(zx,θ(t))logP(x,zθ(t+1))dz \begin{array}{l} Q\left(\theta^{(t)}, \theta^{(t)}\right)=\int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P\left(x, z \mid \theta^{(t)}\right) d z \\ Q\left(\theta^{(t+1)}, \theta^{(t)}\right)=\int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P\left(x, z \mid \theta^{(t+1)}\right) d z \end{array}

根据定义 θ(t+1)=argmaxθzP(zx,θ(t))logP(x,zθ)dz\theta^{(t+1)}=\arg \max _{\theta} \int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P(x, z \mid \theta) d z

所以 Q(θ(t+1),θ(t))Q(θ(t),θ(t))Q\left(\theta^{(t+1)}, \theta^{(t)}\right) \geq Q\left(\theta^{(t)}, \theta^{(t)}\right)

对于 H(θ,θ(t))H(\theta,\theta^{(t)})

H(θ(t+1),θ(t))H(θ(t),θ(t))=zP(zx,θ(t))logP(zx,θ(t+1))dzzP(zx,θ(t))logP(zx,θ(t))dz=zP(zx,θ(t))(logP(zx,θ(t+1))logP(zx,θ(t)))dz=zP(zx,θ(t))logP(zx,θ(t+1))P(zx,θ(t))dz=KL(P(zx,θ(t))P(zx,θ(t+1)))0 \begin{aligned} & H\left(\theta^{(t+1)}, \theta^{(t)}\right)-H\left(\theta^{(t)}, \theta^{(t)}\right) \\ =& \int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P\left(z \mid x, \theta^{(t+1)}\right) d z-\int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P\left(z \mid x, \theta^{(t)}\right) d z \\ =& \int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot\left(\log P\left(z \mid x, \theta^{(t+1)}\right)-\log P\left(z \mid x, \theta^{(t)}\right)\right) d z \\ =& \int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log \frac{P\left(z \mid x, \theta^{(t+1)}\right)}{P\left(z \mid x, \theta^{(t)}\right)} d z\\ =& -K L\left(P\left(z \mid x, \theta^{(t)}\right) \| P\left(z \mid x, \theta^{(t+1)}\right)\right)\\ \le&0 \end{aligned}

因此

Q(θ(t),θ(t))H(θ(t),θ(t))Q(θ(t+1),θ(t))H(θ(t+1),θ(t)) Q\left(\theta^{(t)}, \theta^{(t)}\right)-H\left(\theta^{(t)}, \theta^{(t)}\right) \leq Q\left(\theta^{(t+1)}, \theta^{(t)}\right)-H\left(\theta^{(t+1)}, \theta^{(t)}\right)

所以

logP(xθ(t))logP(xθ(t+1)) \log P\left(x \mid \theta^{(t)}\right) \leq \log P\left(x \mid \theta^{(t+1)}\right)

上次编辑于:
贡献者: kevinng77