Skip to content

联合熵

联合熵定义和单个随机变量是类似的,对于 H(X,Y)H(X,\,Y),其联合概率分布为 p(x,y)p(x,\,y),那么

H(X,Y)=xXyYp(x,y)logp(x,y)H(X,\,Y) = -\sum_{x\in \mathcal{X} }\sum_{y\in \mathcal{Y} } p(x,\,y)\log p(x,\,y)

结论:

  • H(X,X)=H(X)H(X,\,X) = H(X)
  • H(X,Y)=H(Y,X)H(X,\,Y) = H(Y,\,X)

同理也可以写作多个随机变量的形式,也可以表示为期望的形式:

H(X1,X2,,Xn)=p(x1,x2,,xn)logp(x1,x2,,xn)=Elogp(X1,X2,,Xn)\begin{aligned} H(X_1,\,X_2,\,\cdots,\,X_n) & = -\sum p(x_1,\,x_2,\,\cdots,\,x_n)\log p(x_1,\,x_2,\,\cdots,\,x_n) \\ & = -E \log p(X_1,\,X_2,\,\cdots,\,X_n) \end{aligned}

条件熵

联合熵对应的是联合概率密度,当 X=xX = x 已知时,p(YX=x)p(Y \mid X = x) 也是一个概率分布,满足归一化:

yp(Y=yX=x)=yp(x,y)p(x)=p(x)p(x)=1\sum_y p(Y = y \mid X = x) = \sum_y \frac{p(x,\,y)}{p(x)} = \frac{p(x)}{p(x)} = 1

此时 p(YX=x)p(Y \mid X = x) 的熵为

H(YX=x)=yp(yX=x)logp(yX=x)=Elogp(yX=x)H(Y \mid X = x) = \sum_y -p(y \mid X = x)\log p(y \mid X = x) = -E\log p(y \mid X = x)

定义,如果 (X,Y)p(x,y)(X,\,Y) \sim p(x,\,y),条件熵 H(YX)H(Y\mid X) 被定义为

H(YX)=xXp(x)H(YX=x)=xXp(x)yYp(yx)logp(yx)=xXyYp(x,y)logp(yx)=Elogp(YX)\begin{aligned} H(Y \mid X) & = \sum_{x\in \mathcal{X} } p(x) H(Y \mid X = x) \\ & = -\sum_{x\in \mathcal{X} } p(x) \sum_{y \in \mathcal{Y} } p(y \mid x)\log p(y \mid x) \\ & = -\sum_{x \in \mathcal{X} }\sum_{y \in \mathcal{Y} } p(x,\,y)\log p(y \mid x) \\ & = -E \log p(Y \mid X) \end{aligned}