## 统计代写|主成分分析代写Principal Component Analysis代考|Model Selection by Asymptotic Mean Square Error

From the above two sections, we see that by following different model selection criteria or objectives, we essentially have three different types of estimators $\hat{X}$ for a low-rank matrix $X_{0}$ from its noisy measurements: $X=X_{0}+\sigma E$. If we denote the SVD of $X$ by $X=U \Sigma V^{\top}$, the three estimators are of the following forms, respectively:

1. If the rank $d$ is known, the optimal estimate $\hat{X}$ subject to $\operatorname{rank}(\hat{X})=d$ is the truncated SVD solution:
$$\hat{X}{1}=U \mathcal{H}{\sigma_{d+1}}(\Sigma) V^{\top}$$
Alternatively, if the rank $d$ is not known and one uses one of the informationtheoretic criteria given in Section $2.3 .1$ to estimate the dimension $\hat{d}$, then we have only to replace the $d$ in the above solution with the estimated $\hat{d}$.
2. If we try to balance the mean squared error and the dimension as in equation (2.91), the optimal estimate is given by the SVD hard thresholding:
$$\hat{X}{2}=U \mathcal{H}{\sqrt{x}}(\Sigma) V^{\top}$$
for some threshold $\tau>0$.

## 统计代写|主成分分析代写Principal Component Analysis代考|Robust Principal Component Analysis

In the previous chapter, we considered the PCA problem under the assumption that all the sample points are drawn from the same statistical or geometric model: a low-dimensional subspace. In practical applications, it is often the case that some entries of the data points can be missing or incomplete. For example, the 2-dimensional trajectories of an object moving in a video may become incomplete when the object becomes occluded. Sometimes, it could be the case that some entries of the data points are corrupted by gross errors and we do not know a priori which entries are corrupted. For instance, the intensities of some pixels of the face image of a person can be corrupted when the person is wearing glasses. Sometimes it could also be the case that a small subset of the data points are outliers. For instance, if we are trying to distinguish face images from non-face images, then we can model all face images as samples from a low-dimensional subspace, but non-face images will not follow the same model. Such data points that do not follow the model of interest are often called sample outliers and should be distinguished from the case of samples with some corrupted entries, also referred to as intrasample outliers. The main distinction to be made is that in the latter case, we do not want to discard the entire data point, but only the atypical entries.

In this chapter, we will introduce several techniques for recovering a lowdimensional subspace from missing or corrupted data. We will first consider the PCA problem with missing entries, also known as incomplete PCA or low-rank matrix completion (for linear subspaces). In Section 3.1, we will describe several representative methods for solving this problem based on maximum likelihood estimation, convex optimization, and alternating minimization. Such methods are featured due to their simplicity, optimality, or scalability, respectively. In Section $3.2$, we will consider the PCA problem with corrupted entries, also known as the robust PCA (RPCA) problem. We will introduce classical alternating minimization methods for addressing this problem as well as convex optimization methods that offer theoretical guarantees of correctness. Finally, in Section 3.3, we will consider the PCA problem with sample outliers and describe methods for solving this problem based on classical robust statistical estimation techniques as well as techniques based on convex relaxations. Face images will be used as examples to demonstrate the effectiveness of these algorithms.

