## 数学代写|基础数据分析代写Elementary data Analysis代考|k-Nearest-Neighbor Regression

At the other extreme from ignoring the distance between $x_i$ and $x$, we could do nearest-neighbor regression:
$$\widehat{w}\left(x_i, x\right)= \begin{cases}1 & x_i \text { nearest neighbor of } x \ 0 & \text { otherwise }\end{cases}$$
This is very sensitive to the distance between $x_i$ and $x$. If $\mu(x)$ does not change too rapidly, and $X$ is pretty thoroughly sampled, then the nearest neighbor of $x$ among the $x_i$ is probably close to $x$, so that $\mu\left(x_i\right)$ is probably close to $\mu(x)$. However, $y_i=$ $\mu\left(x_i\right)+$ noise, so nearest-neighbor regression will include the noise into its prediction. We might instead do $k$-nearest neighbor regression,
$$\widehat{w}\left(x_i, x\right)=\left{\begin{array}{cl} 1 / k & x_i \text { one of the } k \text { nearest neighbors of } x \ 0 & \text { otherwise } \end{array}\right.$$
Again, with enough samples all the $k$ nearest neighbors of $x$ are probably close to $x$, so their regression functions there are going to be close to the regression function at $x$. But because we average their values of $y_i$, the noise terms should tend to cancel each other out. As we increase $k$, we get smoother functions $-$ in the limit $k=n$ and we just get back the constant. Figure $1.5$ illustrates this for our running example data. ${ }^{10}$ To use $k$-nearest-neighbors regression, we need to pick $k$ somehow. This means we need to decide how much smoothing to do, and this is not trivial. We will return to this point in Chapter 3 .

Because $k$-nearest-neighbors averages over only a fixed number of neighbors, each of which is a noisy sample, it always has some noise in its prediction, and is generally not consistent. This may not matter very much with moderately-large data (especially once we have a good way of picking $k$ ). If we want consistency, we need to let $k$ grow with $n$, but not too fast; it’s enough that as $n \rightarrow \infty, k \rightarrow \infty$ and $k / n \rightarrow 0$ (Györfi et al., 2002, Thm. 6.1, p. 88).

## 数学代写|基础数据分析代写Elementary data Analysis代考|Kernel Smoothers

Changing $k$ in a $k$-nearest-neighbors regression lets us change how much smoothing we’re doing on our data, but it’s a bit awkward to express this in terms of a number of data points. It feels like it would be more natural to talk about a range in the independent variable over which we smooth or average. Another problem with $k$ $\mathrm{NN}$ regression is that each testing point is predicted using information from only a few of the training data points, unlike linear regression or the sample mean, which always uses all the training data. It’d be nice if we could somehow use all the training data, but in a location-sensitive way.

There are several ways to do this, as we’ll see, but a particularly useful one is kernel smoothing, a.k.a. kernel regression or Nadaraya-Watson regression. To begin with, we need to pick a kernel function ${ }^{11} K\left(x_i, x\right)$ which satisfies the following properties:

1. $K\left(x_i, x\right) \geq 0$
2. $K\left(x_i, x\right)$ depends only on the distance $x_i-x$, not the individual arguments
3. $\int x K(0, x) d x=0$
4. $0<\int x^2 K(0, x) d x<\infty$
These conditions together (especially the last one) imply that $K\left(x_i, x\right) \rightarrow 0$ as $\mid x_i-$ $x \mid \rightarrow \infty$. Two examples of such functions are the density of the Unif $(-b / 2, h / 2)$ distribution, and the density of the standard Gaussian $\mathscr{N}(0, \sqrt{h})$ distribution. Here $b$ can be any positive number, and is called the bandwidth.

# 基础数据分析代考

## 数学代写|基础数据分析代写基本数据分析代考|k-Nearest-Neighbor – Regression

$$\widehat{w}\left(x_i, x\right)= \begin{cases}1 & x_i \text { nearest neighbor of } x \ 0 & \text { otherwise }\end{cases}$$这是非常敏感的距离 $x_i$ 和 $x$。如果 $\mu(x)$ 变化不会太快，而且 $X$ 都是经过充分采样的，那么最近的邻居是 $x$ 在 $x_i$ 可能接近于 $x$，因此 $\mu\left(x_i\right)$ 可能接近于 $\mu(x)$。然而， $y_i=$ $\mu\left(x_i\right)+$ 噪声，所以最近邻回归将包括噪声到它的预测。我们可能会做 $k$-nearest neighbor regression，
$$\widehat{w}\left(x_i, x\right)=\left{\begin{array}{cl} 1 / k & x_i \text { one of the } k \text { nearest neighbors of } x \ 0 & \text { otherwise } \end{array}\right.$$

