Blog entry by: Arindam Bhattacharya
I learned many things about the current trends in machine learning from Vision, Language and Artificial Intelligence, 2016 workshop. Here I try to summarise what I gathered from the experience.
Generative Adversarial Network [GAN]
- Data is precious. And depending on what you work on, [labelled] data is scarce. It was only a matter of time, after the success of supervised deep learning methods, that the focus would shift towards semi/unsupervised learning. GAN provides such a framework. And it works spectacularly once trained (although it is notoriously hard to train).
- GAN is a framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. When G and D are both implemented as a Neural Network, the whole system can be trained using backpropagation [source].
- The fields of Vision and Language have huge amounts of unlabelled data. With GAN, these were used on variety of tasks, such as video prediction, image generation and super-resolution (yeah, not much “Language” there).
Again, Supervision is a Bottleneck
- An architecture with feedback connections learning top-down representation was proposed. Top down learning allows the model to learn representaions based on context. For example, an object would more likely be a bottle, if it is placed on a table.
- Such architecture has the ability of self-supervise the learning. Interesting connections were made with feed-back connections in human brain.
- A self-supervised learning agent, for example, may learn better representaion of an object but touching, pushing etc. and using the feedback.
Variational Auto-encoders vs GANs
More deep stuff
- Preliminary studies on utilizing models of intuitive physics for better forecasting effect of actions on objects.
- A tutorial on sequence to sequence modelling was presented, along with its application for lip-reading.
How people think
- People are really good at finding/defining problems. With deep learning being the dominant approach, the main novelty was the problems they chose and tweaks they apply, rather than a innovation in algorithm/architecture. Some new hot applications include Visual Question/Answering and Visual Dialog.
- Defining new problems requires new data. Many spent more than half their time explaining how they are getting data. Various challenges presents themselves here, ranging from time/funds to reliability of people involved. Hence the focus on unsupervised approaches.
- When it comes to give captivating presentations, industry researchers are better than academics [of course biased because of small sample space, but difference was stark].