VDVAE: a hierarchical VAE, generate samples quickly and outperform the PixelCNN in log-likelihood on all the natural image benchmarks. In theory, VAEs can actually represent autoregressive models. VAEs can learn first generate global features at low resolution, then fill in local details in parallel at higher resolutions. Many types of generative models have flourished in recent years, including likelihood-based generative models, which include autoregressive models, VAEs, and invertible flows. Their objective, the negative log-likelihood, is equivalent to the KL divergence between the data distribution and the model distribution. 1) provide theoretical justification for why greater depth could improve VAE performance 2) introduce an architecture capable of scaling past 70 layers 3) verify that depth, independent of model capacity, improves log-likelihood, and allows VAEs to outperform the PixelCNN on all benchmarks 4) uses fewer parameters, generates sam...