## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Conditional Approximations

As we saw in Chap. 4 (Theorem 4.3), the conditional expectation $\mathrm{E}\left(X_2 \mid X_1\right)$ is the mean squared error (MSE) best approximation of $X_2$ by a function of $X_1$. We have in this case
$$X_2=\mathrm{E}\left(X_2 \mid X_1\right)+U=\mu_2+\Sigma_{21} \Sigma_{11}^{-1}\left(X_1-\mu_1\right)+U .$$
Hence, the best approximation of $X_2 \in \mathbb{R}^{p-r}$ by $X_1 \in \mathbb{R}^r$ is the linear approximation that can be written as:
$$X_2=\beta_0+\mathcal{B} X_1+U$$
with $\mathcal{B}=\Sigma_{21} \Sigma_{11}^{-1}, \beta_0=\mu_2-B \mu_1$ and $U \sim N\left(0, \Sigma_{22.1}\right)$.
Consider now the particular case where $r=p-1$. Now $X_2 \in \mathbb{R}$ and $\mathcal{B}$ is a row vector $\beta^{\top}$ of dimension $(1 \times r)$
$$X_2=\beta_0+\beta^{\top} X_1+U .$$
This means, geometrically speaking, that the best MSE approximation of $X_2$ by a function of $X_1$ is a hyperplane. The marginal variance of $X_2$ can be decomposed via (5.11):
$$\sigma_{22}=\beta^{\top} \Sigma_{11} \beta+\sigma_{22.1}=\sigma_{21} \Sigma_{11}^{-1} \sigma_{12}+\sigma_{22.1} .$$
The ratio
$$\rho_{2.1 \ldots r}^2=\frac{\sigma_{21} \Sigma_{11}^{-1} \sigma_{12}}{\sigma_{22}}$$
is known as the square of the multiple correlation between $X_2$ and the $r$ variables $X_1$. It is the percentage of the variance of $X_2$ which is explained by the linear approximation $\beta_0+\beta^{\top} X_1$. The last term in (5.12) is the residual variance of $X_2$. The square of the multiple correlation corresponds to the coefficient of determination introduced in Sect. 3.4, see (3.39), but here it is defined in terms of the r.v. $X_1$ and $X_2$. It can be shown that $\rho_{2.1 . .} r$ is also the maximum correlation attainable between $X_2$ and a linear combination of the elements of $X_1$, the optimal linear combination being precisely given by $\beta^{\top} X_1$. Note that when $r=1$, the multiple correlation $\rho_{2.1}$ coincides with the usual simple correlation $\rho_{X_2 X_1}$ between $X_2$ and $X_1$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|The Wishart Distribution

The Wishart distribution (named after its discoverer) plays a prominent role in the analysis of estimated covariance matrices. If the mean of $X \sim N_p(\mu, \Sigma)$ is known to be $\mu=0$, then for a data matrix $\mathcal{X}(n \times p)$ the estimated covariance matrix is proportional to $\mathcal{X}^{\top} \mathcal{X}$. This is the point where the Wishart distribution comes in, because $\mathcal{M}(p \times p)=\mathcal{X}^{\top} \mathcal{X}=\sum_{i=1}^n x_i x_i^{\top}$ has a Wishart distribution $W_p(\Sigma, n)$
Example $5.4$ Set $p=1$, then for $X \sim N_1\left(0, \sigma^2\right)$ the data matrix of the observations
$$\mathcal{X}=\left(x_1, \ldots, x_n\right)^{\top} \quad \text { with } \quad \mathcal{M}=\mathcal{X}^{\top} \mathcal{X}=\sum_{i=1}^n x_i x_i$$
leads to the Wishart distribution $W_1\left(\sigma^2, n\right)=\sigma^2 \chi_n^2$. The one-dimensional Wishart distribution is thus in fact a $\chi^2$ distribution.

When we talk about the distribution of a matrix, we mean of course the joint distribution of all its elements. More exactly: since $\mathcal{M}=\mathcal{X}^{\top} \mathcal{X}$ is symmetric we only need to consider the elements of the lower triangular matrix
$$\mathcal{M}=\left(\begin{array}{cccc} m_{11} & & & \ m_{21} & m_{22} & \ \vdots & \vdots & \ddots & \ m_{p 1} & m_{p 2} & \ldots & m_{p p} \end{array}\right)$$
Hence the Wishart distribution is defined by the distribution of the vector
$$\left(m_{11}, \ldots, m_{p 1}, m_{22}, \ldots, m_{p 2}, \ldots, m_{p p}\right)^{\top} .$$
Linear transformations of the data matrix $\mathcal{X}$ also lead to Wishart matrices.

