## 数学代写|基础数据分析代写Elementary data Analysis代考|Collinearity

The formula $\beta=\mathbf{v}^{-1} \operatorname{Cov}[\vec{X}, Y]$ makes no sense if $\mathbf{v}$ has no inverse. This will happen if, and only if, the predictor variables are linearly dependent on each other – if one of the predictors is really a linear combination of the others. Then (as we learned in linear algebra) the covariance matrix is of less than “full rank” (i.e., “rank deficient”) and it doesn’t have an inverse. Equivalently, $v$ has at least one eigenvalue which is exactly zero.

So much for the algebra; what does that mean statistically? Let’s take an easy case where one of the predictors is just a multiple of the others – say you’ve included people’s weight in pounds $\left(X_1\right)$ and mass in kilograms $\left(X_2\right)$, so $X_1=2.2 X_2$. Then if we try to predict $Y$, we’d have
\begin{aligned} \widehat{\mu}(\vec{X}) &=\beta_1 X_1+\beta_2 X_2+\beta_3 X_3+\ldots+\beta_p X_p \ &=0 X_1+\left(2.2 \beta_1+\beta_2\right) X_2+\sum_{i=3}^p \beta_i X_i \ &=\left(\beta_1+\beta_2 / 2.2\right) X_1+0 X_2+\sum_{i=3}^p \beta_i X_i \ &=-2200 X_1+\left(1000+\beta_1+\beta_2\right) X_2+\sum_{i=3}^p \beta_i X_i \end{aligned}
In other words, because there’s a linear relationship between $X_1$ and $X_2$, we make the coefficient for $X_1$ whatever we like, provided we make a corresponding adjustment to the coefficient for $X_2$, and it has no effect at all on our prediction. So rather than having one optimal linear predictor, we have infinitely many of them. ${ }^3$

There are three ways of dealing with collinearity. One is to get a different data set where the predictor variables are no longer collinear. A second is to identify one of the collinear variables (it usually doesn’t matter which) and drop it from the data set. This can get complicated; principal components analysis (Chapter 16) can help here. Thirdly, since the issue is that there are infinitely many different coefficient vectors which all minimize the MSE, we could appeal to some extra principle, beyond prediction accuracy, to select just one of them. We might, for instance, prefer smaller coefficient vectors (all else being equal), or ones where more of the coefficients were exactly zero. Using some quality other than the squared error to pick out a unique solution is called “regularizing” the optimization problem, and a lot of attention has been given to regularized regression, especially in the “high dimensional” setting where the number of coefficients is comparable to, or even greater than, the number of data points. See Appendix H.5.5, and exercise 2 in Chapter 8.

## 数学代写|基础数据分析代写Elementary data Analysis代考|The Prediction and Its Error

Once we have coefficients $\beta$, we can use them to make predictions for the expected value of $Y$ at arbitrary values of $\vec{X}$, whether we’ve an observation there before or not. How good are these?

If we have the optimal coefficients, then the prediction error will be uncorrelated with the predictor variables:
\begin{aligned} \operatorname{Cov}[Y-\vec{X} \cdot \beta, \vec{X}] &=\operatorname{Cov}[Y, \vec{X}]-\operatorname{Cov}\left\vec{X} \cdot\left(\mathbf{v}^{-1} \operatorname{Cov}[\vec{X}, Y]\right), \vec{X}\right \ &=\operatorname{Cov}[Y, \vec{X}]-\mathbf{v v}^{-1} \operatorname{Cov}[Y, \vec{X}] \ &=0 \end{aligned}
Moreover, the expected prediction error, averaged over all $\vec{X}$, will be zero (Exercise
2). In general, however, the conditional expectation of the error is not zero,
$$\mathbb{E}[Y-\vec{X} \cdot \beta \mid \vec{X}=\vec{x}] \neq 0$$
and the conditional variance is not constant in $\vec{x}$,
$$\mathbb{V}\left[Y-\vec{X} \cdot \beta \mid \vec{X}=\vec{x}_1\right] \neq \mathbb{V}\left[Y-\vec{X} \cdot \beta \mid \vec{X}=\vec{x}_2\right]$$
The optimal linear predictor can be arbitrarily bad, and it can make arbitrarily big systematic mistakes. It is generally very biased ${ }^4$.

# 基础数据分析代考

## 数学代写|基础数据分析代写Elementary data Analysis代考| collinear

.

\begin{aligned} \widehat{\mu}(\vec{X}) &=\beta_1 X_1+\beta_2 X_2+\beta_3 X_3+\ldots+\beta_p X_p \ &=0 X_1+\left(2.2 \beta_1+\beta_2\right) X_2+\sum_{i=3}^p \beta_i X_i \ &=\left(\beta_1+\beta_2 / 2.2\right) X_1+0 X_2+\sum_{i=3}^p \beta_i X_i \ &=-2200 X_1+\left(1000+\beta_1+\beta_2\right) X_2+\sum_{i=3}^p \beta_i X_i \end{aligned}

## 数学代写|基础数据分析代写Elementary data Analysis代考|The forecast and Its – Error

.预测和错误

\begin{aligned} \operatorname{Cov}[Y-\vec{X} \cdot \beta, \vec{X}] &=\operatorname{Cov}[Y, \vec{X}]-\operatorname{Cov}\left\vec{X} \cdot\left(\mathbf{v}^{-1} \operatorname{Cov}[\vec{X}, Y]\right), \vec{X}\right \ &=\operatorname{Cov}[Y, \vec{X}]-\mathbf{v v}^{-1} \operatorname{Cov}[Y, \vec{X}] \ &=0 \end{aligned}

2)。然而，一般来说，误差的条件期望不为零，
$$\mathbb{E}[Y-\vec{X} \cdot \beta \mid \vec{X}=\vec{x}] \neq 0$$

$$\mathbb{V}\left[Y-\vec{X} \cdot \beta \mid \vec{X}=\vec{x}_1\right] \neq \mathbb{V}\left[Y-\vec{X} \cdot \beta \mid \vec{X}=\vec{x}_2\right]$$

