2.2. Manifold learning

Look for the bare necessities
The simple bare necessities
Forget about your worries and your strife
I mean the bare necessities
Old Mother Nature’s recipes
That bring the bare necessities of life

– Baloo’s song [The Jungle Book]
../_images/sphx_glr_plot_compare_methods_001.png

manifold_img3 manifold_img4 manifold_img5 manifold_img6

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

2.2.1. Introduction

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost.

digits_img projected_img

To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These algorithms define specific rubrics to choose an “interesting” linear projection of the data. These methods can be powerful, but often miss important non-linear structure in the data.

PCA_img LDA_img

Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications.

The manifold learning implementations available in scikit-learn are summarized below

2.2.2. Isomap

One of the earliest approaches to manifold learning is the Isomap algorithm, short for Isometric Mapping. Isomap can be viewed as an extension of Multi-dimensional Scaling (MDS) or Kernel PCA. Isomap seeks a lower-dimensional embedding which maintains geodesic distances between all points. Isomap can be performed with the object Isomap.

../_images/sphx_glr_plot_lle_digits_005.png

Complexity Click for more details

The Isomap algorithm comprises three stages:

  1. Nearest neighbor search. Isomap uses BallTree for efficient neighbor search. The cost is approximately \(O[D \log(k) N \log(N)]\), for \(k\) nearest neighbors of \(N\) points in \(D\) dimensions.

  2. Shortest-path graph search. The most efficient known algorithms for this are Dijkstra’s Algorithm, which is approximately \(O[N^2(k + \log(N))]\), or the Floyd-Warshall algorithm, which is \(O[N^3]\). The algorithm can be selected by the user with the path_method keyword of Isomap. If unspecified, the code attempts to choose the best algorithm for the input data.

  3. Partial eigenvalue decomposition. The embedding is encoded in the eigenvectors corresponding to the \(d\) largest eigenvalues of the \(N \times N\) isomap kernel. For a dense solver, the cost is approximately \(O[d N^2]\). This cost can often be improved using the ARPACK solver. The eigensolver can be specified by the user with the eigen_solver keyword of Isomap. If unspecified, the code attempts to choose the best algorithm for the input data.

The overall complexity of Isomap is \(O[D \log(k) N \log(N)] + O[N^2(k + \log(N))] + O[d N^2]\).

  • \(N\) : number of training data points

  • \(D\) : input dimension

  • \(k\) : number of nearest neighbors

  • \(d\) : output dimension