统计代写|贝叶斯分析代写Bayesian Analysis代考|THE USE OF CONJUGATE PRIORS WITH LATENT VARIABLE

Earlier in this section, it was demonstrated that conjugate priors make Bayesian inference tractable when complete data is available. Example $3.1$ demonstrated this by showing how the posterior distribution can easily be identified when assuming a conjugate prior. Explicit computation of the evidence normalization constant with conjugate priors is often unnecessary, because the product of the likelihood together with the prior lead to an algebraic form of a well-known distribution.

As mentioned earlier, the calculation of the posterior normalization constant is the main obstacle in performing posterior inference. If this is the case, we can ask: do conjugate priors help in the case of latent variables being present in the model? With latent variables, the normalization constant is more complex, because it involves the marginalization of both the parameters and the latent variables. Assume a full distribution over the parameters $\theta$, latent variables $z$ and observed variables $x$ (both being discrete), which factorize as follows:
$$p(\theta, z, x \mid \alpha)=p(\theta \mid \alpha) p(z \mid \theta) p(x \mid z, \theta)$$
The posterior over the latent variables and parameters has the form (see Section $2.2 .2$ for a more detailed example of such posterior):
$$p(\theta, z \mid x, \alpha)=\frac{p(\theta \mid \alpha) p(z \mid \theta) p(x \mid z, \theta)}{p(x \mid \alpha)}$$
and therefore, the normalization constant $p(x \mid \alpha)$ equals:
$$p(x \mid \alpha)=\sum_{z}\left(\int_{\theta} p(\theta) p(z \mid \theta) p(x \mid z, \theta) d \theta\right)=\sum_{z} D(z)$$
where $D(z)$ is defined to be the term inside the sum above. Equation $3.6$ demonstrates that conjugate priors are useful even when the normalization constant requires summing over latent variables. If the prior family is conjugate to the distribution $p(X, Z \mid \theta)$, then the function $D(z)$ will be mathematically easy to compute for any $z$. However, it is not true that $\sum_{z} D(z)$ is always tractable, since the form of $D(z)$ can be quite complex.

统计代写|贝叶斯分析代写Bayesian Analysis代考|MIXTURE OF CONJUGATE PRIORS

Mixture models are a simple way to extend a family of distributions into a more expressive family. If we have a set of distributions $p_{1}(X), \ldots, p_{M}(X)$, then a mixture model over this set of distributions is parametrized by an $M$ dimensional probability vector $\left(\lambda_{1}, \ldots, \lambda_{M}\right)\left(\lambda_{i} \geq 0\right.$, $\left.\sum_{i} \lambda_{i}=1\right)$ and defines distributions over $X$ such that:
$$p(X \mid \lambda)=\sum_{i=1}^{M} \lambda_{i} p_{i}(X)$$
Section 1.5.3 gives an example of a mixture-of-Gaussians model. The idea of mixture models can also be used for prior families. Let $p(\theta \mid \alpha)$ be a prior from a prior family with $\alpha \in A$. Then, it is possible to define a prior of the form:
$$p\left(\theta \mid \alpha^{1}, \ldots, \alpha^{M}, \lambda_{1}, \ldots, \lambda_{M}\right)=\sum_{i=1}^{M} \lambda_{i} p\left(\theta \mid \alpha^{i}\right)$$
where $\lambda_{i} \geq 0$ and $\sum_{i=1}^{M} \lambda_{i}=1$ (i.e., $\lambda$ is a point in the $M-1$ dimensional probability simplex). This new prior family, which is hyperparametrized by $\alpha^{i} \in A$ and $\lambda_{i}$ for $i \in{1, \ldots M}$ will actually be conjugate to a likelihood $p(x \mid \theta)$ if the original prior family $p(\theta \mid \alpha)$ for $\alpha \in A$ is also conjugate to this likelihood.
To see this, consider that when using a mixture prior, the posterior has the form:
\begin{aligned} p\left(\theta \mid x, \alpha^{1}, \ldots, \alpha^{M}, \lambda\right) &=\frac{p(x \mid \theta) p\left(\theta \mid \alpha^{1}, \ldots, \alpha^{M}\right.}{\int_{\theta} p(x \mid \theta) p\left(\theta \mid \alpha^{1}, \ldots, \alpha^{M}\right.} \ &=\frac{\sum_{i=1}^{\cdot M} \lambda_{i} p(x \mid \theta) p\left(\theta \mid \alpha^{I}\right)}{\sum_{i=1}^{M} \lambda_{i} Z_{i}} \end{aligned}
where
$$Z_{i}=\int_{\theta} p(x \mid \theta) p\left(\theta \mid \alpha^{i}\right) d \theta$$
Therefore, it holds that:
$$p\left(\theta \mid x, \alpha^{1}, \ldots, \alpha^{M}, \lambda\right)=\frac{\sum_{i=1}^{M}\left(\lambda_{i} Z_{i}\right) p\left(\theta \mid x, \alpha^{i}\right)}{\sum_{i=1}^{M} \lambda_{i} Z_{i}}$$

because $p(x \mid \theta) p\left(\theta \mid \alpha^{i}\right)=Z_{i} p\left(\theta \mid x, \alpha^{i}\right)$. Because of conjugacy, each $p\left(\theta \mid x, \alpha^{i}\right)$ is equal to $p\left(\theta \mid \beta^{i}\right)$ for some $\beta^{i} \in A(i \in{1, \ldots, M})$. The hyperparameters $\beta^{i}$ are the updated hyperparameters following posterior inference. Therefore, it holds:
$$p\left(\theta \mid x, \alpha^{1}, \ldots, \alpha^{M}, \lambda\right)=\sum_{i=1}^{M} \lambda_{i}^{\prime} p\left(\theta \mid \beta^{i}\right)$$
for $\lambda_{i}^{\prime}=\lambda_{i} Z_{i} /\left(\sum_{i=1}^{M} \lambda_{i} Z_{i}\right)$.

统计代写|贝叶斯分析代写Bayesian Analysis代考|RENORMALIZED CONJUGATE DISTRIBUTIONS

In the previous section, we saw that one could derive a more expressive prior family by using a basic prior distribution in a mixture model. Renormalizing a conjugate prior is another way to change the properties of a prior family while still retaining conjugacy.

Let us assume that a prior $p(\theta \mid \alpha)$ is defined over some parameter space $\Theta$. It is sometimes the case that we want to further constrain $\Theta$ into a smaller subspace, and define $p(\theta \mid \alpha)$ such that its support is some $\Theta_{0} \subset \Theta$. One way to do so would be to define the following distribution $p^{\prime}$ over $\Theta_{0}$ :

$$p^{\prime}(\theta \mid \alpha)=\frac{p(\theta \mid \alpha)}{\int_{\theta^{\prime} \in \Theta_{0}} p\left(\theta^{\prime} \mid \alpha\right) d \theta^{\prime}} .$$
This new distribution retains the same ratio between probabilities of elements in $\Theta_{0}$ as $p$, but essentially allocates probability 0 to any element in $\Theta \backslash \Theta_{0}$.

It can be shown that if $p$ is a conjugate family to some likelihood, then $p^{\prime}$ is conjugate to the same likelihood as well. This example actually demonstrates that conjugacy, in its pure form does not necessitate tractability by using the conjugate prior together with the corresponding likelihood. More specifically, the integral over $\Theta_{0}$ in the denominator of Equation $3.7$ can often be difficult to compute, and approximate inference is required.

The renormalization of conjugate distributions arises when considering probabilistic context-free grammars with Dirichlet priors on the parameters. In this case, in order for the prior to allocate zero probability to parameters that define non-tight PCFGs, certain multinomial distributions need to be removed from the prior. Here, tightness refers to a desirable property of a PCFG so that the total measure of all finite parse trees generated by the underlying context-free grammar is 1 . For a thorough discussion of this issue, see Cohen and Johnson (2013).

统计代写|贝叶斯分析代写Bayesian Analysis代考|THE USE OF CONJUGATE PRIORS WITH LATENT VARIABLE

$$p(\theta, z, x \mid \alpha)=p(\theta \mid \alpha) p(z \mid \theta) p(x \mid z, \theta)$$

$$p(\theta, z \mid x, \alpha)=\frac{p(\theta \mid \alpha) p(z \mid \theta) p(x \mid z, \theta)}{p(x \mid \alpha)}$$

$$p(x \mid \alpha)=\sum_{z}\left(\int_{\theta} p(\theta) p(z \mid \theta) p(x \mid z, \theta) d \theta\right)=\sum_{z} D(z)$$

统计代写|贝叶斯分析代写Bayesian Analysis代考|MIXTURE OF CONJUGATE PRIORS

$$p(X \mid \lambda)=\sum_{i=1}^{M} \lambda_{i} p_{i}(X)$$
1.5.3 节给出了一个混合高斯模型的例子。混合模型的思想也可以用于先验族。让 $p(\theta \mid \alpha)$ 来自以前的家庭 $\alpha \in A$. 然后，可以定义形式的 先验:
$$p\left(\theta \mid \alpha^{1}, \ldots, \alpha^{M}, \lambda_{1}, \ldots, \lambda_{M}\right)=\sum_{i=1}^{M} \lambda_{i} p\left(\theta \mid \alpha^{i}\right)$$

$$p\left(\theta \mid x, \alpha^{1}, \ldots, \alpha^{M}, \lambda\right)=\frac{p(x \mid \theta) p\left(\theta \mid \alpha^{1}, \ldots, \alpha^{M}\right.}{\int_{\theta} p(x \mid \theta) p\left(\theta \mid \alpha^{1}, \ldots, \alpha^{M}\right.} \quad=\frac{\sum_{i=1}^{M} \lambda_{i} p(x \mid \theta) p\left(\theta \mid \alpha^{I}\right)}{\sum_{i=1}^{M} \lambda_{i} Z_{i}}$$

$$Z_{i}=\int_{\theta} p(x \mid \theta) p\left(\theta \mid \alpha^{i}\right) d \theta$$

$$p\left(\theta \mid x, \alpha^{1}, \ldots, \alpha^{M}, \lambda\right)=\frac{\sum_{i=1}^{M}\left(\lambda_{i} Z_{i}\right) p\left(\theta \mid x, \alpha^{i}\right)}{\sum_{i=1}^{M} \lambda_{i} Z_{i}}$$

$$p\left(\theta \mid x, \alpha^{1}, \ldots, \alpha^{M}, \lambda\right)=\sum_{i=1}^{M} \lambda_{i}^{\prime} p\left(\theta \mid \beta^{i}\right)$$

统计代写|贝叶斯分析代写Bayesian Analysis代考|RENORMALIZED CONJUGATE DISTRIBUTIONS

$$p^{\prime}(\theta \mid \alpha)=\frac{p(\theta \mid \alpha)}{\int_{\theta^{\prime} \in \Theta_{0}} p\left(\theta^{\prime} \mid \alpha\right) d \theta^{\prime}}$$

