Introduction and Motivation

Building image generative models that capture the data generation process

The document describes a project completed as part of Altdeep's Causal Modeling in Machine Learning Workshop, taught by Robert Osazuwa Ness.

The full code of this work can be found at

Motivation

The following figure describes Scott McCloud's "Picture Plane", which first appeared in his 1994 book Understanding Comics: The Invisible Art.

The image describes two axes for simplifying a real-world image. On one axis, we simplify the image into abstract geometric shapes. The bottom axis simplifies a real image into symbols that are meaningful to humans. As we move down that axis, a face becomes more symbolic -- it only preserves elements that are meaningful to humans, in terms of having clear markers of gender and emotional expression. Ultimately, it crosses the line into the written word, which is purely symbolic and non-pictorial. In supervised machine learning, we'd call this a label.

This figure is interesting because it gives a clear division of labor between the modeler and deep learning. Deep learning is good at composing geometric primitives into realistic images, humans are good at conceptualizing and representing what is meaningful about a realistic image.

In this tutorial, we use a variational autoencoder infrastructure to build a causal computer vision model. We model the causal-effect relationships of the system explicitly and rely on the decoder to handle the geometric abstractions.

Last updated