• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|数据可视化代写Data visualization代考|EXAMPLES AND TUTORIAL

While most of the real-world datasets like images and text data are very high dimensional, we will use the MNIST handwritten digits dataset for simplicity. The MNIST dataset is a collection of grayscale images of handwritten single digits between 0 and 9 that contains 60,000 images of size $28 \times 28$ pixels. Thus, this dataset has 60,000 data samples with a dimensionality of 784 . To demonstrate dimensionality reduction on this dataset, we use Isomap to reduce the data’s dimensionality and project the data onto a low dimensional feature space. This example will map the data with 784 features to two-dimensional feature space and visualize the results.

Let us import the MNIST handwritten digits dataset from the tensorflow library. Next, we will use the sklearn.manifold module from the scikit-learn library for dimensionality reduction. Finally, after applying Isomap on the dataset, we will plot the results to visualize the low dimensional representation of the data using the matplotlib library.Each image is of size $28 \times 28$ pixels which is flattened into a vector of size 784 . Hence, mnist.train.images is an n-dimensional array (tensor) whose shape is $[55000$, $784]$, whereas, the shape of mnist.train.labels is $[55000,10]$ since there are 10 class labels from 0 to $9 .$

统计代写|数据可视化代写Data visualization代考|EXPLANATION AND WORKING

In application areas such as computer vision and pattern recognition where the dataset not only has a large number of data points but also has data of very high dimensionality, PCA might not be a feasible technique as it becomes computationally expensive to handle the entire data matrix [1]. In such cases, random projection proves to be a powerful method for dimensionality reduction that creates compact representations of high dimensional data, preserving well the distances between data points. This method involves choosing a random subspace for projection that is independent of the input data by using a projection matrix with its entries being randomly sampled and at the same time exhibits substantial computational efficiency and accuracy in projecting data from a very high dimension to a lower dimensional space when compared to other dimensionality reduction methods like PCA. Random projections deal with high dimensional data by mapping them into a lower dimensional space while they guarantee approximate preservation of distances between data points in the lower dimensional space.

Let $\mathrm{X} \in \mathbb{R}^{d}$ be an $n \times d$ matrix of $\mathrm{n}$ data points in high dimensional space $d$. We choose a randomly sampled $d \times k$ projection matrix, $W$, and define the projection of $X$ in lower dimensional space $k$ to be
$$Y=X W$$
where $\mathrm{Y} \in \mathbb{R}^{k}$ is an $n \times k$ matrix that gives the $k$-dimensional approximations of the $n$ data points.

Here $W$ is a $d \times k$ matrix with entries $w_{i j}$ sampled independently at random using distributions such as the Gaussian distribution. The projection matrix $W$ can also be sampled from various other distributions as follows:
$$w_{i j}=\left{\begin{array}{l} +1, p=1 / 2 \ -1, p=1 / 2 \end{array}\right.$$
and
$$w_{i j}=\sqrt{3}\left{\begin{array}{c} +1, p=1 / 6 \ 0, p=2 / 3 \ -1, p=1 / 6 \end{array}\right.$$

统计代写|数据可视化代写Data visualization代考|EXPLANATION AND WORKING

$$Y=X W$$

$\$ \$$w_{-}{i j}=\backslash \operatorname{left}{$$
+1, p=1 / 2-1, p=1 / 2
$$正确的。 and w_{-}{i j}=\backslash sqrt {3} \backslash left {$$
+1, p=1 / 60, p=2 / 3-1, p=1 / 6
$$正确的。 \ \$$

