Programming the Directed Causal Graphical Model
What is a DAG ? How is a causal model represented in a DAG ? How to implement it and make inference in pyro ?
Last updated
What is a DAG ? How is a causal model represented in a DAG ? How to implement it and make inference in pyro ?
Last updated
As mentioned in previous chapters, the relationship between the different entities in the image is described via a DAG and probabilistic queries can be answered by running inference algorithms on the DAG. Below, we take an example and elucidate how to go about answering a few causal queries in pyro before we present the DAG for our use case.
A survey was taken among people and the following data was captured. A DAG indicating the probabilistic relationship between these variables is drawn above
Node-Abbr
Node-Expanded
Values
Explanation
A
Age
Adult, Old, Young
Age of the individual discretized
S
Sex
Male, Female
Sex of the individual
E
Education
High School, University
Highest level of Education
O
Occupation
Employee, Self Employed
Type of Work
R
Residence
Small, Big
Type of city that the individual resides in
T
Travel
Car, Train, Other
Mode of travel
The conditional independence is indirectly encoded in the DAG. We know that Age and Sex are variables that can't be affected by any other variable, at least from a survey point of view. Education is affected by Age and Sex. Education decides what your occupation is and in which city you live. The Mode of travel is affected by Occupation and Residence. The joint probability is given by factorizing the DAG.
Initial conditional probabilities can be computed from data or a prior can be assumed.
Let's say these probabilities were computed from data. It is very easy to interpret. For the variable age, the probability of being an adult is 0.36, being old is 0.16 and being young is 0.48. For variable R, it's conditioned on the values of E. If Education is high school, then the 0th index is accessed and the probability of residence being in a big city is 0.72 whereas, if Education is University, the probability of being in a big city is 0.94.
These probabilities reflect the current state of the relationships between these entities and can be obtained via data.
All probabilistic programs are built up by composing primitive stochastic functions and deterministic computation. In pyro, these models are defined as a function.
We encode the DAG in the above function. The variable name that pyro looks out for is within the sample statement. During inference, a program transformation takes place and this function gets called multiple times. Hence, we get a trace, the samples, of all the pyro variables during inference and those probabilities change depending on the evidence that is provided.
If there are any learn-able parameters, we need another stochastic function named guide to help learn these parameters. Inference algorithms in pyro, such as stochastic variational inference, use the guide functions as approximate posterior distributions. Guide functions must satisfy two criteria to be valid approximations of the model. One, all the unobserved sample statements that appear in the model must appear in the guide. Second, the guide has the same signature as that of the model, i.e. it takes the same arguments. Although, for this mock example, there aren't any learn-able parameters and hence no guide function is needed but for the causal image generation model we need a guide function as we have learn-able weights in our neural network.
Let's take an example. You observe a person with a university degree. What is your prediction of this person's means of travel?. To answer a query like this, we use a condition statement and condition on the evidence. The evidence here is that the person has a university degree. (E = Uni)
We give the value of E = Uni as tensor(1) as it is indexed in position 1.After running an inference algorithm, importance sampling in this case
You can run HMC ( Hamiltonian Monte Carlo) for more accurate posterior computation
When we condition the value of education as Uni, then it should be taken that there are no other possible outcomes other than someone who is educated in a university.
Let's see the difference in the intervention distribution for the same query. In intervention, the effect of the parents of an intervening node is negated/cutoff. Hence, age and sex won't affect education anymore.
The intervention distribution is slightly different than conditional distribution.
For more details and examples, please refer to the following GitHub repo