## 统计代写|数据可视化代写Data visualization代考|Dimensionality Reduction and Data Visualization

While the random projection matrix can also be drawn from a Gaussian distribution, the above distribution called the Achliopta’s distribution [2] is much easier to compute, requiring only a uniform random generator, as the matrix is two-thirds sparse, thus reducing the computation significantly.

With $\mathrm{PCA}$, we find the projected data by computing the covariance matrix and decomposing it into its singular value form which is computation intensive for large data. However, unlike PCA, random projection is fairly simple to compute as it just involves the multiplication of a matrix that can be generated using a simple procedure when compared to the computationally intensive steps involved in PCA. One important property of random projection is that it most approximately preserves the lengths of the original data points as well as the pairwise distances between the original points when mapped to a lower dimensional space up to an arbitrary approximation factor. The Johnson-Lindenstrauss lemma [3] that deals with Euclidean distance-preserving embeddings serves as the basis for projections.

In $\mathrm{PCA}$, the goal is to capture the direction along which the variance is maximized and project the data onto those dircctions, whercas random projcctions do not take into account the original data at all. Instead, a random subspace is chosen for projection though this seems counter-intuitive. The optimality of PCA’s reduction comes at a significant cost – the high computational complexity associated with eigenanalysis in the case where the dimensionality of the data as well as the number of data points is very high [1]. In $\mathrm{PCA}$, the local distances are not given importance. That is, the pairwise distances between points may be arbitrarily distorted as PCA’s objective is to minimize the overall distortion. Hence, there is no assurance that the distance between any two points in the original space will be precisely the same as that in the projected lower dimensional space. However, random projection guarantees to preserve pairwise distances between points in the reduced space with a high probability.
Comparison between $\mathrm{PCA}$ and random projections:
a. Accuracy: Though PCA tends to be the most optimal and accurate dimensionality reduction technique, random projections give results that are comparable to that of $\mathrm{PCA}[1,4]$.
b. Computational complexity: As the dimensions of the data increases, performing PCA becomes computationally expensive, On the other hand, the computational complexity of random projections is significantly lower even while handling data with very high dimensionality.
c. Distance preservation: PCA aims at minimizing the overall distortion and not the pairwise distances between the points. Whereas, random projection provides guarantee that distance between any two points is preserved when projected in a lower dimensional space.

• Firstly, unlike the traditional methods, random projections do not suffer from the curse of dimensionality.
• As discussed earlier in Section 8.1, when the dimension of the data is very high, random projections are computationally efficient and have less time complexity as compared to PCA which involves computation-intensive steps. It is much simpler to generate a random projection matrix and compute the projections that just involve simple matrix multiplication, unlike PCA where it becomes computationally expensive to find the covariance matrix and perform singular value decomposition as the dimensionality increases. This computational complexity associated with very high dimensional data is overcome by random projections. This reduction in the computation cost is the main advantage of random projections over PCA.
• On the other hand, another advantage of random projections is that they provide a guarantee for pairwise distance preservation in low dimensional space. Also, random projections can be used for clustering high dimensional data.
• But the drawback of this is that different random projections lead to different clustering outcomes, making this technique highly unstable for clustering.

# 数据可视化代考

## 统计代写|数据可视化代写Data visualization代考|Dimensionality Reduction and Data Visualization

：准确性：虽然 PCA 往往是最优化和最准确的降维技术，但随机投影给出的结果与主成分分析[1,4].

C。距离保持：PCA 旨在最小化整体失真，而不是点之间的成对距离。然而，随机投影保证了在低维空间中投影时任意两点之间的距离保持不变。

• 首先，与传统方法不同，随机投影不受维数灾难的影响。
• 正如前面 8.1 节所讨论的，当数据的维度非常高时，与涉及计算密集步骤的 PCA 相比，随机投影在计算上是高效的，并且时间复杂度更低。生成随机投影矩阵并计算仅涉及简单矩阵乘法的投影要简单得多，这与 PCA 不同，在 PCA 中，随着维数的增加，找到协方差矩阵并执行奇异值分解会变得计算成本很高。随机投影克服了与非常高维数据相关的计算复杂性。这种计算成本的降低是随机投影优于 PCA 的主要优势。
• 另一方面，随机投影的另一个优点是它们为低维空间中的成对距离保持提供了保证。此外，随机投影可用于聚类高维数据。
• 但是这样做的缺点是不同的随机投影会导致不同的聚类结果，使得这种技术对于聚类非常不稳定。

