Clustering vs Dimensionality Reduction
Clustering and Dimensionality Reduction are applied to the problem of unsupervised learning.
1st PC score: Xv1
Clustering identifies unknown structure in the data;
Dimensionality Reduction uses structural characteristics to simplify data.
Dimensionality Reduction uses structural characteristics to simplify data.
However, there is a problem called Curse of Dimensionality.
In practice, too many features leads to worse performance.
Therefore, we need Dimensionality Reduction.
It's possible to represent data with fewer dimensions, which requires us to discover intrinsic dimensionality of data.
One way to do dimensionality reduction is to perform lower dimensional projections.
In the way, we could transform dataset to have less features, in the new feature space, some original features are combined via linear or nonlinear functions.
PCA: Principal Component Analysis
Find sequence of linear combinations of the features that have maximal variance and are uncorrelated
PCA:1st PC
1st PC of X is unit vector that maximizes the sample variance compared to all other unit vectors
v1=argmax||v1||=1(Xv)^T(Xv)
1st PC score: Xv1
PCA:Next PC
Idea: Successively find orthogonal directions of highest variance
The reason of Orthogonal:
1) want to minimize reduancy
2) want to look at variance in different directions
3) computation is easier
Basic PCA Algorithm
1) start with a zero-centered mxn data matrix X
2) compute covariance matrix
3) find eigenvectors of covariance matrix
4) PCs: k eigenvectors with highest eigenvalues
PCA: In practice
Forming the covariance matrix can require a lot of memory, especially when number of sampels>> numbers of features
Typical approach: use singular value decomposition (SVD)
SVD: each matrix can be decomposed using singular value decomposition
PCA: Applications
~Visualizing high-dimensional data
~Finding hidden relationships
~Compressing information
~Avoiding redundancy
~Detecting outliers/denosining
~Reducing model complexity
The difference between PCA and LDA
PCA: reduce dimensionality while preserving as much of the variance in high-dimensional space as possible
LDA: reduce dimensionality while preserving as much of the class discriminatory information as possible
Comments
Post a Comment