Introduction to Generative Adversarial Networks (GANs)

Anushka Sandesara
Analytics Vidhya
Published in
5 min readSep 6, 2020

--

Ever wondered how you would have looked like if you were of a different gender? or if you could use a number or alphabet that never existed? or you were too lazy to go out and shop for clothes and instead try new styles at home without even wearing that cloth? Sounds fascinating, right? With the help of deep learning, this is not only possible but relatively easy. The neural networks that make this possible are termed adversarial networks.

It is a robust class of neural networks that are used fundamentally for unsupervised learning. It is composed of two neural network models who compete with each other and then further are able to analyze the changes in the dataset accordingly

EVOLUTION OF GANs

The history of GANs is lengthy but still, it is one of the most protean neural networks in use. In a research paper published in 2014 entitled “Generative Adversarial Networks” by Goodfellow (former Google Brain research scientist and director of machine learning at Apple’s Special Projects Group)and his colleagues presented the first working implementation of GANs. Goodfellow has asserted that he was motivated by noise-contrastive estimation. They use similar loss functions as GANs( both possess the same measure of performance to estimate outcomes). Moreover, Juergen Schmidhuber(research co-director of Dalle Molle Institute for Artificial Intelligence) proposed predictability minimization. It embraces what’s known as a minimax decision rule, where the possible loss for a worst-case scenario is minimized as much as possible. This is the prototype upon which GANs are built.

GAN Model Architecture

It comprises of 2 sub-models

  1. Generator:-It is a model that is used to produce probable examples from the domain of the question provided earlier. The model uses a stable length vector as input and produces the sample in the domain. The vector which is used here is selected in a whimsical manner from the Gaussian Distribution. After the completion of training, the multidimensional vector formed will resemble points in the problem, forming a flattened view of the provided data distribution.
  2. Discriminator:- It is a model that is used to categorize examples as genuine or fake.
WORKING OF GANs

Working of GANs

The Initial step is to recognize the desired end output and collect an initial training dataset according to those parameters. This data is then randomly picked and passed into the generator until it gains cardinal accuracy in producing outputs. Furthermore, these generated images then input into the discriminator along with the original data points. The task of the discriminator is to filter the information and return the probability which represents the image’s authenticity.

1:- correspond with real

0:- correspond with duplicate or fake

Types of GANs

  1. Vanilla GAN:-It possesses the most simple algorithm among all other GANs. In this type, the generator and discriminator are simple multilayer perceptions. The main purpose here is to identify and optimize the mathematical equation using stochastic gradient descent.
  2. Conditional GAN (CGAN):-Here both generator and discrimination receive some supplementary conditioning input information. Also, the system is augmented with a vector of features that control and guide the Generator regarding what to do. The evaluation of discriminator is done on the basis of the correlation between fake data image to either its input label or features. The disadvantage of CGAN is that it is not always unsupervised, it sometimes requires labels to function appropriately.
  3. Laplacian Pyramid GAN (LAPGAN):- It combines the CGAN model with a Laplacian Pyramid representation. The pyramid is a linear invertible image representation which comprises of a collection of band-pass images, spaced as an octave apart and also a low-frequency residual. By implementing this model, the generation can be divided into successive refinements which is the central idea in this model. This approach is widely preferred because it produces high-quality images. The image used is firstly down-sampled at each layer of the pyramid and then again scaled at each layer but in a backward pass where sometimes the image gains some certain level of noise from CGAN.
  4. Deep Convolutional Gan (DCGAN):-It mainly includes a convolutional layer without max pooling. Moreover, it eliminates the fully connected layers, uses transposed convolution mainly for upsampling, uses Batch normalization excluding the output layer for the generator and input layer for the discriminator. It is much more reliable and effective to use until the complexity of the generator does not improve the quality of the image produced.
  5. Super Resolution GAN(SRGAN):-It applies a deep network in combination with an adversary network which is useful to produce higher resolution images. The fundamental motive behind this approach is to retrieve fine textures from the image when upscaling is performed without compromising the quality. While the training stage, a high-resolution image (HR) is downsampled to a low-resolution image(LR) then the GAN generator upsamples the LR images to super-resolution(SR). In this, a discriminator is mainly used to differentiate the HR images and then backpropagate the GAN loss to again train the discriminator and generator.
  6. Cycle GAN:-Unlike other models, this model does not require a dataset comprising of paired images. This helps to develop a translation model of problems where training datasets are not present. It is able to use a collection of items from each domain and then extract the underlying style of items from the collection. This model consists of two generator models:- one generator is useful for generating images on one domain and the second generator for generating images for the second domain. Combined together, both generators are trained better to reproduce the authentic image, called as cycle consistency.
  7. Progressively Growing GAN(PROGAN):-This model is able to synthesize high-resolution images by gradually increasing the discriminator and generator networks during the training phase. Also, many new blocks are added to both the generator model and the discriminator model. The incremental nature here helps to first explore the large structure of image distribution and then shifts focus to details, instead of having to learn all the scales thoroughly.
  8. Style GAN:-This model begins with PROGAN network architecture and further reuses many hyperparameters or adjusting the “style”. Stochastic variation is established in noise at all the points in the generator model. The noise is further added to complete feature maps that allow this model to interpret the style more effectively. It no longer takes a significant point from latent space as input instead it uses two novel sources of randomness to generate a synthetic image. Moreover, it also introduces a new parameter to measure performance called perceptual performance length

Comparative Summary

References

Anushka Sandesara

Monit Patel

--

--