Things I Learned at VLAI 2016

Blog entry by: Arindam Bhattacharya

I learned many things about the current trends in machine learning from Vision, Language and Artificial Intelligence, 2016 workshop. Here I try to summarise what I gathered from the experience.

Generative Adversarial Network [GAN]

  • Data is precious. And depending on what you work on, [labelled] data is scarce. It was only a matter of time, after the success of supervised deep learning methods, that the focus would shift towards semi/unsupervised learning. GAN provides such a framework. And it works spectacularly once trained (although it is notoriously hard to train).
  • GAN is a framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. When G and D are both implemented as a Neural Network, the whole system can be trained using backpropagation [source].
  • The fields of Vision and Language have huge amounts of unlabelled data. With GAN, these were used on variety of tasks, such as video prediction, image generation and super-resolution (yeah, not much “Language” there).

Again, Supervision is a Bottleneck

  • An architecture with feedback connections learning top-down representation was proposed. Top down learning allows the model to learn representaions based on context. For example, an object would more likely be a bottle, if it is placed on a table.
  • Such architecture has the ability of self-supervise the learning. Interesting connections were made with feed-back connections in human brain.
  • A self-supervised learning agent, for example, may learn better representaion of an object but touching, pushing etc. and using the feedback.

Variational Auto-encoders vs GANs

  • Some favor the nice probabilistic formulation of VAEs [source], that allow to carry forward the theory of graphical models.
  • In general though, in the field of Vision, GANs are preffered as they are better at generating visual features.

More deep stuff

  • Preliminary studies on utilizing models of intuitive physics for better forecasting effect of actions on objects.
  • A tutorial on sequence to sequence modelling was presented, along with its application for lip-reading.

How people think

  • People are really good at finding/defining problems. With deep learning being the dominant approach, the main novelty was the problems they chose and tweaks they apply, rather than a innovation in algorithm/architecture. Some new hot applications include Visual Question/Answering and Visual Dialog.
  • Defining new problems requires new data. Many spent more than half their time explaining how they are getting data. Various challenges presents themselves here, ranging from time/funds to reliability of people involved. Hence the focus on unsupervised approaches.
  • When it comes to give captivating presentations, industry researchers are better than academics [of course biased because of small sample space, but difference was stark].

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s