Reading Deep One-class Classification
Deep One-class Classification: a two-stage framework for deep one-class classification.
1) learn self-supervised representations from one-class data
2) build one-class classifiers on learned representations
Self-supervised representation algorithms, the connections to existing one-class classification methods, SOTA contrastive representation learning for one-class classification
A: the stochastic data augmentation process, including resize and crop, horizontal flip, color jittering, gray-scale and gaussian blur.
Feature extractor f, loss L, projection head g.
network structure as gof, g is the projection head used to compute proxy losses and f outputs representations for the downstream task.
One way of representation learning is to learn by discriminating augmentations applied to the data. Although not trained to do so, the likelihood of learned rotation classifiers is shown to well approximate the normality score and has been used for one-class classification. A plausible explanation is via outlier exposure, where the classifier learns a decision boundary distinguishing original images from simulated outliers by image rotation.
Contrastive learning learns representation by distinguishing different views of itself from other data instances. It could be problematic for one-class classification.
1. a class collision. The contrastive loss is minimized by maximizing the distance between representations of negative pairs even though they are from the same class when applied to the one-class classification.
2. a uniformity of representations. Reducing the uniformity is easier to isolate outliers from inliers.
One-Class Contrastive Learning:
1) to reduce the uniformity of representations, a moderate M (batch size) is used but not large M.
2) distribution augmentation is proposed for one-class contrastive learning. Instead of modeling the training data distribution, model the union of augmented training distribution; employ geometric transformations, such as rotation or horizontal flip, for distribution augmentation.
To construct a classifier, for generative approaches, use nonparametric kernel density estimation (KDE) to estimate densities from learned representations; for discriminative approaches, train one-class SVMs.
Experiments:
Evaluate on one-class classification benchmarks, including CIFAR-10, CIFAR-100, Fashion-MNIST, and Cat-vs-Dog. Images from one class are given as inlier and those from remaining classes are given as outlier.
CelebA eyeglasses dataset, face images with eyeglasses are denoted as an outliers. Lastly, in addition to semantic anomaly detection, also consider the defect detection benchmark MVTec.
Results:
Report the mean and standard deviation of AUCs averaged over classes over 5 runs. The mean of 5 datasets is weighted by the number of classes for each dataset.
Paper source: https://openreview.net/pdf?id=HCSgyPUfeDj
Comments
Post a Comment