## 统计代写|线性回归分析代写linear regression analysis代考|LEAST SQUARES ESTIMATION

Let $Y$ be a random variable that fluctuates about an unknown parameter $\eta$; that is, $Y=\eta+\varepsilon$, where $\varepsilon$ is the fluctuation or error. For example, $\varepsilon$ may be a “natural” fluctuation inherent in the experiment which gives rise to $\eta$, or it may represent the error in measuring $\eta$, so that $\eta$ is the true response and $Y$ is the observed response. As noted in Chapter 1, our focus is on linear models, so we assume that $\eta$ can be cxpressed in the form
$$\eta=\beta_0+\beta_1 x_1+\cdots+\beta_{p-1} x_{p-1},$$
where the explanatory variables $x_1, x_2, \ldots, x_{p-1}$ are known constants (e.g., experimental variables that are controlled by the experimenter and are measured with negligible error), and the $\beta_j(j=0,1, \ldots, p-1)$ are unknown parameters to be estimated. If the $x_j$ are varied and $n$ values, $Y_1, Y_2, \ldots, Y_n$, of $Y$ are observed, then
$$Y_i=\beta_0+\beta_1 x_{i 1}+\cdots+\beta_{p-1} x_{i, p-1}+\varepsilon_i \quad(i=1,2, \ldots, n),$$
where $x_{i j}$ is the $i$ th value of $x_j$. Writing these $n$ equations in matrix form, we have
$$\left(\begin{array}{c} Y_1 \ Y_2 \ \vdots \ Y_n \end{array}\right)=\left(\begin{array}{ccccc} x_{10} & x_{11} & x_{12} & \cdots & x_{1, p-1} \ x_{20} & x_{21} & x_{22} & \cdots & x_{2, p-1} \ \vdots & \vdots & \vdots & \vdots & \vdots \ ] x_{n 0} & x_{n 1} & x_{n 2} & \cdots & x_{n, p-1} \end{array}\right)\left(\begin{array}{c} \beta_0 \ \beta_1 \ \vdots \ \beta_{p-1} \end{array}\right)+\left(\begin{array}{c} \varepsilon_1 \ \varepsilon_2 \ \vdots \ \varepsilon_n \end{array}\right),$$
or
$$\mathbf{Y}-\mathbf{X} \boldsymbol{\beta}+\varepsilon$$ where $x_{10}=x_{20}=\cdots=x_{n 0}=1$. The $n \times p$ matrix $\mathbf{X}$ will be called the regression matrix, and the $x_{i j}$ ‘s are generally chosen so that the columns of $\mathbf{X}$ are linearly independent; that is, $\mathbf{X}$ has rank $p$, and we say that $\mathbf{X}$ has full rank. However, in some experimental design situations, the elements of $\mathbf{X}$ are chosen to be 0 or 1 , and the columns of $\mathbf{X}$ may be linearly dependent. In this case $\mathbf{X}$ is commonly called the design matrix, and we say that $\mathbf{X}$ has less than full rank.

## 统计代写|线性回归分析代写linear regression analysis代考|PROPERTIES OF LEAST SQUARES ESTIMATES

If we assume that the errors are unbiased (i.e., $E[\varepsilon]=0$ ), and the columns of $\mathbf{X}$ are linearly independent, then
\begin{aligned} E[\hat{\beta}] &=\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} E[\mathbf{Y}] \ &=\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf{X} \boldsymbol{\beta} \ &=\beta, \end{aligned}
and $\hat{\beta}$ is an unbiased estimate of $\beta$. If we assume further that the $\varepsilon_i$ are uncorrelated and have the same variance, that is, $\operatorname{cov}\left[\varepsilon_i, \varepsilon_j\right]=\delta_{i j} \sigma^2$, then $\operatorname{Var}[\varepsilon]=\sigma^2 \mathbf{I}n$ and $$\operatorname{Var}[\mathbf{Y}]=\operatorname{Var}[\mathbf{Y}-\mathbf{X} \boldsymbol{\beta}]=\operatorname{Var}[\varepsilon]$$ Hence, by (1.7), \begin{aligned} \operatorname{Var}[\hat{\boldsymbol{\beta}}] &=\operatorname{Var}\left[\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf{X}\right] \ &=\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \operatorname{Var}[\mathbf{Y}] \mathbf{X}\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \ &=\sigma^2\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1}\left(\mathbf{X}^{\prime} \mathbf{X}\right)\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \ &=\sigma^2\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \end{aligned} The question now arises as to why we chose $\hat{\beta}$ as our estimate of $\beta$ and not some other estimate. We show below that for a reasonable class of estimates, $\hat{\beta}_j$ is the estimate of $\beta_j$ with the smallest variance. Here $\hat{\beta}_j$ can be extracted from $\hat{\beta}=\left(\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}{p-1}\right)^{\prime}$ simply by premultiplying by the row vector $\mathbf{c}^{\prime}$, which contains unity in the $(j+1)$ th position and zeros elsewhere. It transpires that this special property of $\hat{\beta}_j$ can be generalized to the case of any linear combination $\mathbf{a}^{\prime} \hat{\beta}$ using the following theorem.

THEOREM 3.2 Let $\hat{\theta}$ be the least squares estimate of $\boldsymbol{\theta}=\mathbf{X} \boldsymbol{\beta}$, where $\boldsymbol{\theta} \in$ $\Omega=\mathcal{C}(\mathbf{X})$ and $\mathbf{X}$ may not have full rank. Then among the class of linear unbiased estimates of $\mathbf{c}^{\prime} \theta, \mathbf{c}^{\prime} \hat{\theta}$ is the unique estimate with minimum variance. [We say that $\mathbf{c}^{\prime} \hat{\boldsymbol{\theta}}$ is the best linear unbiased estimate (BLUE) of $\mathbf{c}^{\prime} \boldsymbol{\theta}$.]

