EXPLANATION AND WORKING

Let us first understand the concept of geodesic distance since it is essential in nonlinear dimensionality reduction. As we already discussed in Chapter 6 , Multidimensional Scaling (MDS) tries to maintain the Euclidean distance in the lower dimension. However, in some cases preserving Euclidean distance in the lower dimension while carrying out dimensionality reduction might not give us the desired result. The Euclidean metric for distance works only if the neighborhood structure can be approximated as a linear structure in nonlinear manifolds.

Let us say that the neighborhood structure consists of holes. In such cases, Euclidean distances will be highly misleading. Contrary to this, if we measure the distance between points by traversing the manifold, we can have a better approximation of how near or far any two points on the manifold are [1]. Let us understand this concept using a simplified example. Let us assume our data lies on a two-dimensional circular manifold structured as shown in Figure 7.1.

Why is the geodesic distance a better fit than the Euclidean distance in a nonlinear manifold?

As seen in Figure 7.1, the two-dimensional data is reduced to one dimension, using both Euclidean distances and approximate geodesic distances. Now, if we check the 1D mapping based on the Euclidean metric, we observe that the two very distant points (in this case, the points a and b) have been mapped poorly. As stated earlier, only the points that lie on the neighborhood structure can be approximated as a linear structure (in this case, the points $\mathrm{c}$ and $\mathrm{d}$ ) to give satisfactory results. On the other hand, if we check the 1D mapping based on geodesic distances, we can observe that it rightly approximates the close points as neighbors and the far away points as distant.
The geodesic distances between two points can be approximated by graph distance between the two points, sō, as we can see from the above discussion, even though the Euclidean metric does a relatively poor job in approximating the distance between two points in nonlinear manifolds, the geodesic metric of distances gives satisfactory results. Hence, while dealing with finding the approximated distance between points on a nonlinear manifold, using the geodesic metric of distances is a better fit. Isomap uses the concept of geodesic distance to solve the problem of dimensionality reduction.

ISOMAP ALGORITHM

1. For every point in the original dataset, find it is k-Nearest Neighbor (with respect to the actual distance in the high-dimensional space).
2. Plot the k-nearest neighbor graph $\mathrm{G}=(\mathrm{V}, \mathrm{E})$. Every point of the dataset is a vertex in the graph (hence, there n-vertices in total), and every point in this graph is connected to its k-nearest neighbors by an edge.
3. Compute the pairwise distances between all pairs of points in the graph using the graph’s geodesic distance as the metric and represent it using a matrix (say A).
4. Find points in the lower-dimensional space such that pairwise distances between points are approximately the same as distances between the points on the graph.
The algorithm’s output is the low dimensional representation of all the points computed in the final step.

The idea is that if we have enough points on the high dimensional manifold, that is if they are packed tightly, and because the manifold is locally Euclidean, the k-nearest neighbors on the original high dimensional space are also the manifold’s k-nearest neighbors. We then form the nearest neighbor graph based on the idea that the shortest distances on the graph correspond to the manifold.

Now, how do we compute the pairwise shortest path between all the points on a graph? We either use the Floyd Warshall algorithm or Dijkstra’s algorithm between all pairs to get the pairwise distance between all pairs of points on the manifold. Now we simply need to find points on low dimensional space such that Euclidean distances in this lower dimensional space are approximately equal to the shortest distance on the manifold between these points.

1. 对于原始数据集中的每个点，找到它是k-Nearest Neighbor（相对于高维空间中的实际距离）。
2. 绘制 k-最近邻图G=(在,和). 数据集的每个点都是图中的一个顶点（因此，总共有 n 个顶点），并且该图中的每个点都通过一条边连接到它的 k 最近邻。
3. 使用图形的测地距离作为度量，计算图形中所有点对之间的成对距离，并使用矩阵（比如 A）表示它。
4. 在低维空间中查找点，使得点之间的成对距离与图上点之间的距离大致相同。
该算法的输出是在最后一步中计算的所有点的低维表示。

