Clustering vs Dimensionality Reduction

November 25, 2017

Clustering and Dimensionality Reduction are applied to the problem of unsupervised learning.

Clustering identifies unknown structure in the data;
Dimensionality Reduction uses structural characteristics to simplify data.

However, there is a problem called Curse of Dimensionality.

In practice, too many features leads to worse performance.

Therefore, we need Dimensionality Reduction.

It's possible to represent data with fewer dimensions, which requires us to discover intrinsic dimensionality of data.

One way to do dimensionality reduction is to perform lower dimensional projections.

In the way, we could transform dataset to have less features, in the new feature space, some original features are combined via linear or nonlinear functions.

PCA: Principal Component Analysis

Find sequence of linear combinations of the features that have maximal variance and are uncorrelated

PCA:1st PC

1st PC of X is unit vector that maximizes the sample variance compared to all other unit vectors

v1=argmax||v1||=1(Xv)^T(Xv)

1st PC score: Xv1

PCA:Next PC

Idea: Successively find orthogonal directions of highest variance

The reason of Orthogonal:

1) want to minimize reduancy

2) want to look at variance in different directions

3) computation is easier

Basic PCA Algorithm

1) start with a zero-centered mxn data matrix X

2) compute covariance matrix

3) find eigenvectors of covariance matrix

4) PCs: k eigenvectors with highest eigenvalues

PCA: In practice

Forming the covariance matrix can require a lot of memory, especially when number of sampels>> numbers of features

Typical approach: use singular value decomposition (SVD)

SVD: each matrix can be decomposed using singular value decomposition

PCA: Applications

~Visualizing high-dimensional data

~Finding hidden relationships

~Compressing information

~Avoiding redundancy

~Detecting outliers/denosining

~Reducing model complexity

The difference between PCA and LDA

PCA: reduce dimensionality while preserving as much of the variance in high-dimensional space as possible

LDA: reduce dimensionality while preserving as much of the class discriminatory information as possible

Search This Blog

Sophie's Daily Note