## 统计代写|线性回归代写linear regression代考|Residual Plots

Remark 2.3. Residual plots magnify departures from the model while the response plot emphasizes how well the MLR model fits the data.

Since the residuals $r_i=\hat{e}_i$ are estimators of the errors, the residual plot is used to visualize the conditional distribution $e \mid S P$ of the errors given the sufficient predictor $\mathrm{SP}=\boldsymbol{x}^T \boldsymbol{\beta}$, where $\mathrm{SP}$ is estimated by $\widehat{Y}=\boldsymbol{x}^T \hat{\boldsymbol{\beta}}$. For the unimodal MLR model, there should not be any pattern in the residual plot: as a narrow vertical strip is moved from left to right, the behavior of the residuals within the strip should show little change.

Notation. A rule of thumb is a rule that often but not always works well in practice.

Rule of thumb 2.1. If the residual plot would look good after several points have been deleted, and if these deleted points were not gross outliers (points far from the point cloud formed by the bulk of the data), then the residual plot is probably good. Beginners often find too many things wrong with a good model. For practice, use the lregpack function MLRsim to generate several MLR data sets, and make the response and residual plots for these data sets: type MLRsim(nruns $=10$ ) in $R$ and right click Stop for each plot (20 times) to generate 10 pairs of response and residual plots. This exercise will help show that the plots can have considerable variability even when the MLR model is good. See Problem 2.30.

Rule of thumb 2.2. If the plotted points in the residual plot look like a left or right opening megaphone, the first model violation to check is the assumption of nonconstant variance. (This is a rule of thumb because it is possible that such a residual plot results from another model violation such as nonlinearity, but nonconstant variance is much more common.)

The residual plot of $\hat{Y}$ versus $r$ should always be made. It is also a good idea to plot each nontrivial predictor $x_j$ versus $r$ and to plot potential predictors $w_j$ versus $r$. If the predictor is quantitative, then the residual plot of $x_j$ versus $r$ should look like the residual plot of $\hat{Y}$ versus $r$. If the predictor is qualitative, e.g. gender, then interpreting the residual plot is much more difficult; however, if each category contains many observations, then the plotted points for each category should form a vertical line centered at $r=0$ with roughly the same variability (spread or range).

## 统计代写|线性回归代写linear regression代考|Other Model Violations

Without loss of generality, $E(e)=0$ for the unimodal MLR model with a constant, in that if $E(\tilde{e})=\mu \neq 0$, then the MLR model can always be written as $Y=\boldsymbol{x}^T \boldsymbol{\beta}+e$ where $E(e)=0$ and $E(Y) \equiv E(Y \mid \boldsymbol{x})=\boldsymbol{x}^T \boldsymbol{\beta}$. To see this claim notice that
\begin{aligned} Y=\tilde{\beta}_1+x_2 \beta_2+\cdots+& x_p \beta_p+\tilde{e}=\tilde{\beta}_1+E(\tilde{e})+x_2 \beta_2+\cdots+x_p \beta_p+\tilde{e}-E(\tilde{e}) \ =& \beta_1+x_2 \beta_2+\cdots+x_p \beta_p+e \end{aligned}
where $\beta_1=\tilde{\beta}_1+E(\tilde{e})$ and $e=\tilde{e}-E(\tilde{e})$. For example, if the errors $\tilde{e}_i$ are iid exponential $(\lambda)$ with $E\left(\tilde{e}_i\right)=\lambda$, use $e_i=\tilde{e}_i-\lambda$.

For least squares, it is crucial that $\sigma^2$ exists. For example, if the $e_i$ are iid Cauchy $(0,1)$, then $\sigma^2$ does not exist and the least squares estimators tend to perform very poorly.

The performance of least squares is analogous to the performance of $\bar{Y}$. The sample mean $\bar{Y}$ is a very good estimator of the population mean $\mu$ if the $Y_i$ are iid $N\left(\mu, \sigma^2\right)$, and $\bar{Y}$ is a good estimator of $\mu$ if the sample size is large and the $Y_i$ are iid with mean $\mu$ and variance $\sigma^2$. This result follows from the central limit theorem (CLT), but how “large is large” depends on the underlying distribution. The $n>30$ rule tends to hold for distributions that are close to normal in that they take on many values and $\sigma^2$ is not huge. Error distributions that are highly nonnormal with tiny $\sigma^2$ often need $n>>30$. For example, if $Y_1, \ldots, Y_n$ are iid $\operatorname{Gamma}(1 / m, 1)$, then $n>25 m$ may be needed. Another example is distributions that take on one value with very high probability, e.g. a Poisson random variable with very small variance. Bimodal and multimodal distributions and highly skewed distributions with large variances also need larger $n$. Chihara and Hesterberg (2011, p. 177) suggest using $n>5000$ for moderately skewed distributions.

There are central limit type theorems for the least squares estimators that depend on the error distribution of the iid errors $e_i$. See Theorems $2.8,11.25$, and 12.7. We always assume that the $e_i$ are continuous random variables with a probability density function. Error distributions that are close to normal may give good results for moderate $n$ if $n \geq 10 p$ and $n-p \geq 30$ where $p$ is the number of predictors. Error distributions that need large $n$ for the CLT to apply for $\bar{e}$, will tend to need large $n$ for the limit theorems for least squares to apply (to give good approximations).

\begin{aligned} Y=\tilde{\beta}_1+x_2 \beta_2+\cdots+& x_p \beta_p+\tilde{e}=\tilde{\beta}_1+E(\tilde{e})+x_2 \beta_2+\cdots+x_p \beta_p+\tilde{e}-E(\tilde{e}) \ =& \beta_1+x_2 \beta_2+\cdots+x_p \beta_p+e \end{aligned}
，其中$\beta_1=\tilde{\beta}_1+E(\tilde{e})$和$e=\tilde{e}-E(\tilde{e})$。例如，如果错误$\tilde{e}_i$是iid指数$(\lambda)$和$E\left(\tilde{e}_i\right)=\lambda$，则使用$e_i=\tilde{e}_i-\lambda$ .

