Artificial Intelligence Suite - Generative Adversarial Network (GAN) Using Neural Network, Deep Learning and Reinforcement Learning

Exploring the vastness of Artificial Intelligence Suite, it is the Generative Adversarial Network (GAN) using Neural Network, Deep Learning and Reinforcement Learning that is captivating the interest of researchers as well as technologists with amazing breakthroughs.

Artificial Intelligence Suite - Generative Adversarial Network (GAN) Using Neural Network, Deep Learning and Reinforcement Learning

Amongst all advancements, science and technology have ushered upon mankind, artificial intelligence (AI) is the ultimate, the world reckons with. The highly accelerated velocity at which AI is growing, new breakthroughs have become a regular practice. One of the terminologies of AI suite gaining prominence in recent days is Generative Adversarial Network (GAN). In this article, I would take a deep dive with an attempt to explain what GAN encompasses and where it fits in the AI family.

For more technology insights, follow me @Asamanyakm

Generative Adversarial Network (GAN) is defined as a powerful class of neural networks used for unsupervised learning. The principle behind GAN was developed and introduced at its most basic level by Ian J. Goodfellow in 2014. GANs are basically made up of an Artificially Intelligent system of two competing neural network models which can analyse, capture as well as copy the variations within a dataset in an endeavour to improve the quality of their results.

What led to the development of GANs?

If you are working on artificial intelligence technologies, you may have noticed most of the mainstream neural networks can be easily conned into misclassifying objects by adding a trivial amount of garbage into the original data. You will be astonished to find the model gaining higher confidence in the wrong prediction than when it predicted correctly. The reason for such adverse output is that the majority of the machine learning models learn from a limited amount of data, which is a huge constraint, as it is prone to deviation.

Also, the mapping between the input and the output is almost linear. Although, it may seem that the boundaries of separation between the various classes are linear, in actual they are composed of linearities and even a minute change in a point in the feature space might lead to misclassification of data.

How do GANs function?

Generative Adversarial Networks (GANs) can be fragmented into three parts:

Generative: A generative model is trained, which describes how data is generated in terms of a probability distribution as a solution.
Adversarial: The training of a model is done in an adversarial setting.
Networks: Deep neural networks are used as artificial intelligence algorithms for training purposes.

In general, GANs are comprised of a generator and a discriminator, both of which are Neural Networks. The Generator generates spurious samples of data e.g., image, audio, etc., and tries to con the Discriminator. On the other hand, the Discriminator tries to distinguish between the real and spurious samples. Both the Generator and the Discriminator run in competition with each other in the training phase. The steps are iterated several times and with each attempt, both the Generator and Discriminator keep improving in their respective jobs through the repetitive cycle. The functioning can be visualized by the diagram given below:

To understand how GANs work, imagine an ordinary painter trying to create copies of paintings by the great masters. Initially, he has no idea what a masterpiece painting should look like, but he happens to have a partner who has a visual memory of every masterpiece painting. This gifted partner must determine whether the paintings his comrade is creating match the features of those created by the real great masters or are obvious duplicates.

This is the basic idea of how a GAN operates – only as they are Artificially Intelligent systems, both the duplicate painter and his partner can work at a high pace, making and detecting thousands of duplicates per second. In the process, both "learn" from the outcome to improve their future performance. As the partner becomes better at detecting duplicities, the painter must become better at creating them.

How are GANs trained?

Training a GAN consists of two parts:

The Discriminator is trained while the Generator is idle. In this phase, the network is only forward propagated and no back-propagation is done. The Discriminator is trained on real data for N epochs and tested if it can correctly predict them as real. Also, in this phase, the Discriminator is trained on the spurious generated data from the Generator and tested if it can correctly predict them as fake.
The Generator is trained while the Discriminator is idle. After the Discriminator is trained by the generated data of the Generator, its predictions are used for training the Generator as well as improving it from the previous state to try and con the Discriminator. The above method is repeated for a few epochs and then manually the spurious data is tested if it seems genuine. If it appears acceptable, then the training is stopped, otherwise, its allowed to continue for a few more epochs.

In recent days, GANs are seen to be creating a lot of enthusiasm in the field of AI development due to their ability to create “new” information following rules established by existing information. An example might be writing user guide documents. By training a GAN on millions of user guides, it could one day be possible to create a system that could examine any tool, device or software and then create user guide on how to use it.

Thus, the network that creates spurious data is termed the generative network, and its job is to read and comprehend the properties of the training data. The generative network then attempts to replicate the training data by producing “candidate” datasets that follow the same rules. The network tasked with determining whether the generative network is yielding false (artificially generated) data or real (training) data is known as the discriminative network. Since the discriminative network competes against the generative network, the entire system is described as "adversarial".

What are the different types of GANs?

GANs are increasingly making their way into research labs and being actively discussed amongst AI enthusiasts. Several varied types of GANs have been implemented so far, out of which a few important ones more actively in current use are elaborated below:

Vanilla GAN is the simplest type of GAN. In this type of GAN, the Generator and the Discriminator are simple multi-layered perceptron. Here, simple algorithm is used and the Vanilla GAN tries to optimize the mathematical equation using stochastic gradient descent.

Conditional GAN (CGAN) can be described as a deep learning method in which some conditional parameters are put into place. In this GAN, an additional parameter is added to the Generator for generating the corresponding data. Labels are also put into the input to the Discriminator for the Discriminator to help distinguish the real data from the artificially generated data.

Deep Convolutional GAN (DCGAN) is one of the most popular as well as the most successful implementation of GAN. It is composed of Convnets in place of a multi-layer perceptron. Convolutional Neural Networks or Convnets or CNNs are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. Convnets have been successful in identifying faces, objects and traffic signs apart from powering vision in robots and self-driving cars. The Convnets are implemented without max pooling, which is in fact replaced by convolutional stride. Also, the layers are not fully connected.

Laplacian Pyramid GAN (LAPGAN) is a linear invertible image representation consisting of a set of band-pass images, spaced an octave apart, plus a low-frequency residual. This approach uses multiple numbers of Generator and Discriminator networks with different levels of the Laplacian Pyramid. This approach is primarily used to produce very high-quality images. The image is down-sampled at first at each layer of the pyramid and then it is again up-scaled at each layer in a backward pass where the image acquires some properties from the Conditional GAN at these layers until it reaches its original size.

Super Resolution GAN (SRGAN) as the name suggests is a way of designing a GAN in which a deep neural network is used along with an adversarial network in order to produce higher resolution images. This type of GAN is particularly useful in optimally up-scaling native low-resolution images to enhance details minimizing errors while doing so.

It's amazing to watch GAN in action for which you may simply visit the website This Person Does Not Exist. The network empowering the website has learned to produce ultra-realistic images of human faces, which do not exist outside of the computer program. While at first, you might assume that the program builds images of faces by putting together pieces from a database of human facial organs such as eyes, ears, mouths, and hair, this isn't the case. The "input" data for the generative network is only a string of numbers – only the discriminative network visualizes the training data. The generative network improves its output entirely on the basis of the output of the discriminative network. As the only feedback the discriminative network gives is yes/no guesses at whether the generative output matches the training data, it takes innumerable attempts before it starts to produce output that is acceptably close to the desired results – in this case, a realistic-looking image of a non-existing human being.

The data used for training an adversarial network does not need to be labelled, as the discriminative network can decide on the output of the generative network based entirely on features of the training data itself. This means GANs have applications in unsupervised learning as well as supervised (where the data is labelled) and reinforcement learning.

Another useful feature of GANs is that they can be used to efficiently create training datasets for feeding other AI applications. The latest AI techniques, in particular, Deep Learning, depend on access to large amounts of data for training purposes. GANs can generate datasets that follow all the rules of "natural" datasets and can be used for training of deep learning models. This capability of GANs is being utilized in healthcare and life sciences for images that can be expensive and time-consuming to collect for real, requiring both patient consent and medical expertise to label them.

GANs are gaining popularity in creating images, videos, text, and even music. It is clearly one of the most interesting nascent concepts to have come up from the AI field in recent years, and we can expect to see many exciting new applications based on GAN in the days to come.

For more technology insights, follow me @Asamanyakm