Generative Adversarial Networks (GAN)

Generative Adversarial Networks (GAN)

Improved Human Pose Estimation through Adversarial Data Augmentation

IHPEADA

Deep models are usually trained in a two-phase paradigm, where data collection and network training are separated. This may not be efficient since data collection is blind to network training. Why not jointly optimize these two? We propose adversarial data augmentation. The key idea is to develop a generator ({\it e.g.} an augmentation network) that competes against a discriminator ({\it e.g.} a target network) by generating hard examples online. The generator explores weaknesses of the discriminator, while the discriminator learns from hard augmentations to get better performance. Moreover, a reward/penalty strategy is designed to guarantee the joint training and avoid problematic convergence behaviors. We investigate human pose estimation to validate our idea. Extensive ablation studies, as well as comparisons on benchmarks, prove that our method can significantly improve the state of the art without additional data effort.

 

Publications

  • : Improved Human Pose Estimation through Adversarial Data Augmentation, The Internatioanl Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

 

 

 

 

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

stack gan v2

Although Generative Adversarial Networks (GANs) have shown remarkable success in various tasks, they still face challenges in generating high-quality images. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images. First, we propose a two-stage generative adversarial network architecture, StackGAN-v1, for text-to-image synthesis. The Stage-I GAN sketches the primitive shape and colors of the object based on given text description, yielding low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs and generates high-resolution images with photo-realistic details. Second, an advanced multi-stage generative adversarial network architecture, StackGANv2, is proposed for both conditional and unconditional generative tasks. Our StackGAN-v2 consists of multiple generators and discriminators in a tree-like structure; images at multiple scales corresponding to the same scene are generated from different branches of the tree. StackGAN-v2 shows more stable training behavior than StackGAN-v1 by jointly approximating multiple distributions. Extensive experiments demonstrate that the proposed stacked generative adversarial networks significantly outperform other state-of-the-art methods in generating photo-realistic images.

 

Publications

  • Han Zhang*, Tao Xu*, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas: StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. 
    arXiv:1710.10916, 2017

 

 

 

 

IMPROVING GANS USING OPTIMAL TRANSPORT

gan OT

We present Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution. This metric, which we call mini-batch energy distance, combines optimal transport in primal form with an energy distance defined in an adversarially learned feature space, resulting in a highly discriminative distance function with unbiased mini-batch gradients. Experimentally we show OT-GAN to be highly stable when trained with large mini-batches, and we present state-of-the-art results on several popular benchmark problems for image generation.

 

Publications

  • Tim Salimans*, Han Zhang*, Alec Radford, Dimitris Metaxas: Improving GANs Using Optimal Transport
    ICLR, 2018.

 

 

 

 

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

stack gan

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256⇥256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details to the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.

 

Publications

  • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas.: StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
    ICCV, 2017. Oral Presentation

 

 

 

 

stack gan

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

 

stack gan v2

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

 

gan OT

Improving GANs Using Optimal Transport

 We present Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution

IHPEADA

Improved Human Pose Estimation through Adversarial Data Augmentation

Deep models are usually trained in a two-phase paradigm, where data collection and network training are separated. This may not be efficient since data collection is blind to network training.