Random Forests SVM

Bagged classier using decision trees:

1) Each split only considers a random group of features
2) Tree is grown to maximum size without pruning
3) Final predictions obtained by aggregating over the B trees




Out of Bags(OOB) samples:

From each observation, construct its random forest predictor by averaging only those trees corresponding to bootstrap samples in which observation does not appear

OOB error estimates can be fit in one sequence

Once OOB stabilizes, training can be stopped

OOB can also be used for variable importance

Ensembles and Multi-Learners

Goal: use multiple learners to solve parts of the same problem 
Ensembles: competing learners with multiple looks at the same problem

SVM: Support Vector Machine(SVM)

Find large margin separator to improve generalization
Use optimization to find solution with few errors
Use kernel trick to make large feature spaces computationally efficient
Chose the linear separator with the largest margin
Robust to outliers



SVM unconstrained optimization problem:

Equivalent form looks like regularization term + hinge loss
As C gets large, have to separate the data
As C gets small, ignores the data entirely

Kernels:

Kernel function is used to make non-linear feature map
Think of kernel measure as similarity

Polynominal SVM:

Example: polynomial of degree 2
Common Kernels:

Kernel SVM: overfitting
Control overfitting by setting C via cross-validation, choose better kernel, vary parameters of the kernel(i.e., width of Gaussian )


Purpose of Validation:
1. performing model selection
2. avoid overfitting
3. selecting the right parameters

Monte-Carlo Cross-Validation:
As know as random sub-sampling
Randomly select some fraction of your data to  form training set
Assign rest to test set
Repeat multiple times with new partitions

Cross Validation must be applied to the entire sequence of modeling steps


Comments

Popular posts from this blog

Reading CLIP

Reading CutPaste

OOD-related papers