|
| 1 | +Estimating Average Causal Effects |
| 2 | +================================= |
| 3 | + |
| 4 | +One of the most common causal questions is how much does a certain target quantity differ under two different |
| 5 | +interventions/treatments. This is also known as average treatment effect (ATE) or, more generally, average causal |
| 6 | +effect (ACE). The simplest form is the comparison of two treatments, i.e. what is the difference of my target quantity |
| 7 | +on average given treatment A vs treatment B. For instance, do patients treated with a certain medicine (:math:`T:=1`) recover |
| 8 | +faster than patients who were not treated at all (:math:`T:=0`). The ACE API allows to estimate such differences in a |
| 9 | +target node, i.e. it estimates the quantity :math:`\mathbb{E}[Y | \text{do}(T:=A)] - \mathbb{E}[Y | \text{do}(T:=B)]` |
| 10 | + |
| 11 | +How to use it |
| 12 | +^^^^^^^^^^^^^^ |
| 13 | + |
| 14 | +Lets generate some data with an obvious impact of a treatment. |
| 15 | + |
| 16 | +>>> import networkx as nx, numpy as np, pandas as pd |
| 17 | +>>> import dowhy.gcm as gcm |
| 18 | +>>> X0 = np.random.normal(0, 0.2, 1000) |
| 19 | +>>> T = (X0 > 0).astype(float) |
| 20 | +>>> X1 = np.random.normal(0, 0.2, 1000) + 1.5 * T |
| 21 | +>>> Y = X1 + np.random.normal(0, 0.1, 1000) |
| 22 | +>>> data = pd.DataFrame(dict(T=T, X0=X0, X1=X1, Y=Y)) |
| 23 | + |
| 24 | +Here, we see that :math:`T` is binary and adds 1.5 to :math:`Y` if it is 1 and 0 otherwise. As usual, lets model the |
| 25 | +cause-effect relationships and fit it on the data: |
| 26 | + |
| 27 | +>>> causal_model = gcm.ProbabilisticCausalModel(nx.DiGraph([('X0', 'T'), ('T', 'X1'), ('X1', 'Y')])) |
| 28 | +>>> gcm.auto.assign_causal_mechanisms(causal_model, data) |
| 29 | +>>> gcm.fit(causal_model, data) |
| 30 | + |
| 31 | +Now we are ready to answer the question: "What is the causal effect of setting :math:`T:=1` vs :math:`T:=0`?" |
| 32 | + |
| 33 | +>>> gcm.average_causal_effect(causal_model, |
| 34 | +>>> 'Y', |
| 35 | +>>> interventions_alternative={'T': lambda x: 1}, |
| 36 | +>>> interventions_reference={'T': lambda x: 0}, |
| 37 | +>>> num_samples_to_draw=1000) |
| 38 | +1.5025054682995396 |
| 39 | + |
| 40 | +The average effect is ~1.5, which coincides with our data generation process. Since the method expects an dictionary |
| 41 | +with interventions, we can also intervene on multiple nodes and/or specify more complex interventions. |
| 42 | + |
| 43 | +**Note** that although it seems difficult to correctly specify the causal graph in practice, it often suffices to |
| 44 | +specify a graph with the correct causal order. This is, as long as there are no anticausal relationships, adding |
| 45 | +too many edges from upstream nodes to a downstream node would still provide reasonable results when estimating causal |
| 46 | +effects. In the example above, we get the same result if we add the edge :math:`X0 \rightarrow Y` and |
| 47 | +:math:`T \rightarrow Y`: |
| 48 | + |
| 49 | +>>> causal_model.graph.add_edge('X0', 'Y') |
| 50 | +>>> causal_model.graph.add_edge('T', 'Y') |
| 51 | +>>> gcm.auto.assign_causal_mechanisms(causal_model, data, override_models=True) |
| 52 | +>>> gcm.fit(causal_model, data) |
| 53 | +>>> gcm.average_causal_effect(causal_model, |
| 54 | +>>> 'Y', |
| 55 | +>>> interventions_alternative={'T': lambda x: 1}, |
| 56 | +>>> interventions_reference={'T': lambda x: 0}, |
| 57 | +>>> num_samples_to_draw=1000) |
| 58 | +1.509062353057525 |
| 59 | + |
| 60 | +To further account for potential interactions between root nodes that were not modeled, we can also pass in |
| 61 | +observational data instead of generating new ones: |
| 62 | + |
| 63 | +>>> gcm.average_causal_effect(causal_model, |
| 64 | +>>> 'Y', |
| 65 | +>>> interventions_alternative={'T': lambda x: 1}, |
| 66 | +>>> interventions_reference={'T': lambda x: 0}, |
| 67 | +>>> observed_data=data) |
| 68 | +1.4990885925844586 |
| 69 | + |
| 70 | +Understanding the method |
| 71 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 72 | + |
| 73 | +Estimating the average causal effect is straightforward seeing that this only requires to compare the two expectations |
| 74 | +of a target node based on samples from their respective interventional distribution. This is, we can boil down the ACE |
| 75 | +estimation to the following steps: |
| 76 | + |
| 77 | +1. Draw samples from the interventional distribution of :math:`Y` under treatment A. |
| 78 | +2. Draw samples from the interventional distribution of :math:`Y` under treatment B. |
| 79 | +3. Compute their respective means. |
| 80 | +4. Take the differences of the means. This is, :math:`\mathbb{E}[Y | \text{do}(T:=A)] - \mathbb{E}[Y | \text{do}(T:=B)]`, |
| 81 | + where we do not need to restrict the type of interventions or variables we want to intervene on. |
0 commit comments