经济代写|计量经济学代写Econometrics代考|Problems associated with R2

There are a number of serious problems associated with the use of $R^{2}$ to judge the performance of a single equation or as a basis of comparison of different equations:

1 Spurious regression problem (this problem will be discussed fully in Chapters 16 and 17). In the case where two or more variables are actually unrelated, but exhibit strong trend-like behaviour, the $R^{2}$ can reach very high values (sometimes even greater than 0.9). This may mislead the researcher into believing there is actually a strong relationship between the variables.
2 High correlation of $X_{t}$ with another variable $Z_{t}$. It might be that there is a variable $Z_{t}$ that determines the behaviour of $Y_{t}$ and is highly correlated with $X_{t}$. Then, even though a large value of $R^{2}$ shows the importance of $X_{t}$ in determining $Y_{t}$, the omitted variable $Z_{t}$ may be responsible for this.
3 Correlation does not necessarily imply causality. No matter how high the value of $R^{2}$, this cannot suggest causality between $Y_{t}$ and $X_{t}$, because $R^{2}$ is a measure of correlation between the observed value $Y_{t}$ and the predicted value $\hat{Y}_{t}$. To whatever extent possible, we should refer to economic theory, previous empirical work and intuition to determine a causally related variable to include in a sample regression.
4 Time series equations versus cross-section equations. Time series equations almost always generate higher $R^{2}$ values than cross-section equations. This is because crosssectional data contain a great deal of random variation (usually called ‘noise’), which makes ESS small relative to TSS. On the other hand, even badly specified time series equations can give $R^{2}$ values of $0.999$ for the spurious regression reasons presented in point 1 above. Therefore, comparisons of time series and cross-sectional equations using $R^{2}$ are not possible.

经济代写|计量经济学代写Econometrics代考|Hypothesis testing and confidence intervals

Under the assumptions of the CLRM, we know that the estimators $\hat{a}$ and $\hat{\beta}$ obtained by OLS follow a normal distribution with means $a$ and $\beta$ and variances $\sigma_{\hat{a}}^{2}$ and $\sigma_{\hat{\beta}}^{2}$, respectively. It follows that the variables:

$$\frac{\hat{a}-a}{\sigma_{\hat{a}}} \text { and } \frac{\hat{\beta}-\beta}{\sigma_{\hat{\beta}}}$$
have a standard normal distribution (that is a normal distribution with mean 0 and variance 1). If we replace the unknown $\sigma_{\hat{a}}$ and $\sigma_{\hat{\beta}}$ by their estimates $s_{\hat{a}}$ and $s_{\hat{\beta}}$ this is no longer true. However, it is relatively easy (based on Chapter 1) to show that the following random variables (after the replacement):
$$\frac{\hat{a}-a}{s_{\hat{a}}} \text { and } \frac{\hat{\beta}-\beta}{s_{\hat{\beta}}}$$
follow the Student’s $t$-distribution with $n-2$ degrees of freedom. The Student’s $t$-distribution is close to the standard normal distribution except that it has fatter tails, particularly when the number of degrees of freedom is small.

1 伪回归问题（这个问题将在第 16 章和第 17 章中全面讨论) 。如果两个或多个变量实际上不相关，但表现出强烈的趋势行为，则 $R^{2}$ 可以达到非常高的值（有时 甚至大于 0.9）。这可能会误导研究人员认为变量之间实际上存在很强的关系。
2 高相关性 $X_{t}$ 与另一个变量 $Z_{t}$. 可能有一个变量 $Z_{t}$ 这决定了行为 $Y_{t}$ 并且高度相关 $X_{t}$. 那么，即使有很大的价值 $R^{2}$ 显示了重要性 $X_{t}$ 在确定 $Y_{t}$, 省略的变量 $Z_{t}$ 可能对 此负责。
3 相关并不一定意味着因果关系。不管价值多高 $R^{2}$, 这不能表明两者之间的因果关系 $Y_{t}$ 和 $X_{t}$ ，因为 $R^{2}$ 是观测值之间相关性的度量 $Y_{t}$ 和预测值 $\hat{Y}_{t}$. 在任何可能的情 㑆下，我们都应该参考经济理论、以前的实证工作和直觉来确定一个因果相关的变量以包含在样本回归中。
4 时间序列方程与横截面方程。时间序列方程几乎总是产生更高的 $R^{2}$ 值比横截面方程。这是因为横截面数据包含大量随机变化 (通常称为”㗍声”)，这使得 ESS 相对于 TSS 较小。另一方面，即使是指定不好的时间序列方程也可以给出 $R^{2}$ 的值 $0.999$ 由于上面第 1 点中提出的虚假回归原因。因此，时间序列和横截面方程的 比较使用 $R^{2}$ 是不可能的。

经济代写|计量经济学代写Econometrics代考|Hypothesis testing and confidence intervals

$$\frac{\hat{a}-a}{\sigma_{\hat{a}}} \text { and } \frac{\hat{\beta}-\beta}{\sigma_{\hat{\beta}}}$$

$$\frac{\hat{a}-a}{s_{\hat{a}}} \text { and } \frac{\hat{\beta}-\beta}{s_{\hat{\beta}}}$$

