Skip to content

Commit 3ab0e2a

Browse files
authored
[New refutation] Add OverRule for learning Boolean rules to describe support/overlap (#791)
* feat: Add scaffolding for overrule, including basic test Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Update dependencies for overrule Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add the full set of overrule code Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * style: Black styling on overrule code style: Black formatting on beam_search style: Black formatting for ruleset style: Black format utils style: Black formatting on load_process_data_BCS Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Change np.matmul to np.dot Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Update to appropriate matmul notation for latest CVXPY Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Remove unnecessary overrule code Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Minimum working example for OverRule Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * test: Minimum viable test for OverRule test: Fix test to work with new interface Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Print rules with option to recompute metrics Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Improve printing of results for refutation Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Pass in additional arguments to OverRule Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Add docstrings to ruleset.py docs: Docstrings for assess_overlap.py Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Update logger Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Add additional docstrings Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Path bug Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add notebook with a toy example to demonstrate OverRule Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add back using LP coeff by default Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Typing and docstrings docs: Consistent module docstrings docs: Add docstrings to overrule/utils.py docs: Add docstrings to overrule/BCS/beam_search.py docs: Add doctstrings and typing to load_process_data_BCS docs: Add docstrings and type hints to BCS/overlap_boolean_rule.py Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Fix and rename notebook for demonstrating overrule Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Use default_rng instead of setting a global seed in sample_Unif Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Use rng in place of numpy.random Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Replace list with numpy array to fix type error Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Fix type hint on ref_range Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add option to only fit overlap or support rules Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Flesh out example notebook with parameters Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add thresh_override as a argument for configuration Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Functional API for overrule has defaults Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Add API reference Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * ci: Fix support rule test Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * chore: Update poetry.lock for cxvpy dependency Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Remove `XGBClassifier` as default classifier To avoid dependency on `xgboost`, replace `XGBClassifier` as the default propensity score model with `RandomForestClassifier` from `sklearn` Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Fix logic so that when verbose=True, silent=False Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Remove seaborn dependency from overrule notebook Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add option to pass random seed to support estimation Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Update notebook to use random seed on support estimation Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Typo Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add PSID dataset (observational controls for Lalonde) Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Prevent fitting overlap rules if all samples in overlap region One of the assertions in OverlapBooleanRule will trip if all samples are in the overlap region. This commit adds a more informative error if the assertion gets tripped, and raises a more informative warning upstream if all samples are in the overlap region Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Add function that can be used with target_units `refute.filter_dataframe(df)` can be used to filter a dataframe to units that are in the overlap/support region. Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Clarify notebook intro Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Clarify how to read rules in refuter output Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * fix: Return a copy when filtering Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Update notebook with Lalonde example Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Add return to docstring Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * docs: Add citation for pricing problem Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * feat: Change progressbar error to warning Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> * chore: Update lockfile Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com> Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
1 parent c0de390 commit 3ab0e2a

15 files changed

+2892
-17
lines changed

docs/source/dowhy.causal_refuters.rst

+9
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,15 @@ dowhy.causal\_refuters.add\_unobserved\_common\_cause module
1212
:undoc-members:
1313
:show-inheritance:
1414

15+
dowhy.causal\_refuters.assess_overlap module
16+
-----------------------------------------------------------
17+
18+
.. automodule:: dowhy.causal_refuters.assess_overlap
19+
:members:
20+
:undoc-members:
21+
:show-inheritance:
22+
23+
1524
dowhy.causal\_refuters.bootstrap\_refuter module
1625
------------------------------------------------
1726

docs/source/example_notebooks/dowhy_refuter_assess_overlap.ipynb

+880
Large diffs are not rendered by default.
+126
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
import logging
2+
import warnings
3+
from typing import List, Optional
4+
5+
from dowhy.causal_refuter import CausalRefuter
6+
from dowhy.causal_refuters.assess_overlap_overrule import OverlapConfig, OverruleAnalyzer, SupportConfig
7+
8+
logger = logging.getLogger(__name__)
9+
10+
11+
class AssessOverlap(CausalRefuter):
12+
"""Assess Overlap
13+
14+
This class implements the OverRule algorithm for assessing support and overlap via Boolean Rulesets, from [1].
15+
16+
[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of
17+
Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International
18+
Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
19+
"""
20+
21+
def __init__(self, *args, **kwargs):
22+
"""
23+
Initialize the parameters required for the refuter.
24+
25+
Arguments are passed through to the `refute_estimate` method. See dowhy.causal_refuters.assess_overlap_overrule
26+
for the definition of the `SupportConfig` and `OverlapConfig` dataclasses that define optimization
27+
hyperparameters.
28+
29+
.. warning::
30+
This method is only compatible with estimators that use backdoor adjustment, and will attempt to acquire
31+
the set of backdoor variables via `self._target_estimand.get_backdoor_variables()`.
32+
33+
:param: cat_feats: List[str]: List of categorical features, all others will be discretized
34+
:param: support_config: SupportConfig: DataClass with configuration options for learning support rules
35+
:param: overlap_config: OverlapConfig: DataClass with configuration options for learning overlap rules
36+
:param: overlap_eps: float: Defines the range of propensity scores for a point to be considered in the overlap
37+
region, with the range defined as `(overlap_eps, 1 - overlap_eps)`, defaults to 0.1
38+
:param: overrule_verbose: bool: Enable verbose logging of optimization output, defaults to False
39+
:param: support_only: bool: Only fit rules to describe the support region (do not fit overlap rules), defaults to False
40+
:param: overlap_only: bool: Only fit rules to describe the overlap region (do not fit support rules), defaults to False
41+
"""
42+
super().__init__(*args, **kwargs)
43+
# TODO: Check that the target estimand has backdoor variables?
44+
self._backdoor_vars = self._target_estimand.get_backdoor_variables()
45+
self._cat_feats = kwargs.pop("cat_feats", [])
46+
self._support_config = kwargs.pop("support_config", None)
47+
self._overlap_config = kwargs.pop("overlap_config", None)
48+
self._overlap_eps = kwargs.pop("overlap_eps", 0.1)
49+
if self._overlap_eps < 0 or self._overlap_eps > 1:
50+
raise ValueError(f"Value of `overlap_eps` must be in [0, 1], got {self._overlap_eps}")
51+
self._support_only = kwargs.pop("support_only", False)
52+
self._overlap_only = kwargs.pop("overlap_only", False)
53+
self._overrule_verbose = kwargs.pop("overrule_verbose", False)
54+
55+
def refute_estimate(self, show_progress_bar=False):
56+
"""
57+
Learn overlap and support rules.
58+
59+
:param show_progress_bar: Not implemented, will raise error if set to True, defaults to False
60+
:type show_progress_bar: bool
61+
:raises NotImplementedError: Will raise this error if show_progress_bar=True
62+
:returns: object of class OverruleAnalyzer
63+
"""
64+
if show_progress_bar:
65+
warnings.warn("No progress bar is available for OverRule")
66+
67+
return assess_support_and_overlap_overrule(
68+
data=self._data,
69+
backdoor_vars=self._backdoor_vars,
70+
treatment_name=self._treatment_name,
71+
cat_feats=self._cat_feats,
72+
overlap_config=self._overlap_config,
73+
support_config=self._support_config,
74+
overlap_eps=self._overlap_eps,
75+
support_only=self._support_only,
76+
overlap_only=self._overlap_only,
77+
verbose=self._overrule_verbose,
78+
)
79+
80+
81+
def assess_support_and_overlap_overrule(
82+
data,
83+
backdoor_vars: List[str],
84+
treatment_name: str,
85+
cat_feats: List[str] = [],
86+
overlap_config: Optional[OverlapConfig] = None,
87+
support_config: Optional[SupportConfig] = None,
88+
overlap_eps: float = 0.1,
89+
support_only: bool = False,
90+
overlap_only: bool = False,
91+
verbose: bool = False,
92+
):
93+
"""
94+
Learn support and overlap rules using OverRule.
95+
96+
:param data: Data containing backdoor variables and treatment name
97+
:param backdoor_vars: List of backdoor variables. Support and overlap rules will only be learned with respect to
98+
these variables
99+
:type backdoor_vars: List[str]
100+
:param treatment_name: Treatment name
101+
:type treatment_name: str
102+
:param cat_feats: Categorical features
103+
:type cat_feats: List[str]
104+
:param overlap_config: Configuration for learning overlap rules
105+
:type overlap_config: OverlapConfig
106+
:param support_config: Configuration for learning support rules
107+
:type support_config: SupportConfig
108+
:param: overlap_eps: float: Defines the range of propensity scores for a point to be considered in the overlap
109+
region, with the range defined as `(overlap_eps, 1 - overlap_eps)`, defaults to 0.1
110+
:param: support_only: bool: Only fit the support region
111+
:param: overlap_only: bool: Only fit the overlap region
112+
:param: verbose: bool: Enable verbose logging of optimization output, defaults to False
113+
"""
114+
analyzer = OverruleAnalyzer(
115+
backdoor_vars=backdoor_vars,
116+
treatment_name=treatment_name,
117+
cat_feats=cat_feats,
118+
overlap_config=overlap_config,
119+
support_config=support_config,
120+
overlap_eps=overlap_eps,
121+
support_only=support_only,
122+
overlap_only=overlap_only,
123+
verbose=verbose,
124+
)
125+
analyzer.fit(data)
126+
return analyzer

0 commit comments

Comments
 (0)