Causal Machine Learning Methods for Differential Variance Inference
Authors: Philippe Boileau, Hani Zaki, Mireille Schnizter
What’s cmldiffvar?
cmldiffvar implements causal machine learning methods for differential variance inference. These methods rely on semiparametric efficiency theory and flexible machine learning methods — namely, Super Learner ensembles — to avoid the need for convenience assumptions about data-generating processes (van der Laan and Rose 2011; van der Laan, Polley, and Hubbard 2007). Hypothesis tests about differential variance can uncover heterogeneous treatment effects, even when the effect modifiers are excluded from the data. Details on the methodology are provided in Boileau et al. (In preparation).
Installation
The development version of the package may be installed from GitHub using remotes:
remotes::install_github("PhilBoileau/cmldiffvar")Example
We estimate the absolute differential variance, defined as the difference of the potential outcomes’ standard deviations, on a random sample of the toy_population_tbl data included with the cmldiffvar package. This dataset represents an observational study in which the treatment variable is binary, the outcome is continuous, and a single confounder was measured. The true absolute differential variance in this population is . Because the absolute differential variance is non-zero, the treatment effect is heterogeneous.
We use a targeted maximum likelihood estimator (van der Laan and Rubin 2006; van der Laan and Rose 2011, 2018), the cmldiffvar() function’s default estimator, to infer the differential variance of this population. The function outputs a point estimate and a confidence interval by default. A p-value corresponding to a test of whether the differential variance is significantly different from zero is also provided. This is equivalent to testing whether the treatment effect is homogeneous.
# load the required packages
library(cmldiffvar)
library(dplyr)
library(SuperLearner)
# set the seed for reproducibility
set.seed(510)
# random sample from population data
sample_tbl <- slice_sample(toy_population_tbl, n = 250)
# estimate absolute differential variance
dif_var_result_tbl <- sample_tbl |>
cmldiffvar(
propensity_score_adj_var_names = "confounder",
cond_exp_outcome_adj_var_names = "confounder",
treatment_var_name = "treatment",
outcome_var_name = "outcome"
)| estimand | estimate | se | ci_low | ci_high | p_value |
|---|---|---|---|---|---|
| absolute differential variance | 2.11 | 0.29 | 1.53 | 2.69 | 0.00 |
The absolute differential variance point estimate is near the ground truth. Additionally, the test correctly rejects the null hypothesis of a homogeneous treatment effect at the significance level.
Contributions
Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.
Citation
Please cite the following paper when using the cmldiffvar R software package.
@unpublished{boileau2025,
author = {Philippe A Boileau and Hani Zaki and Gabriele Lileikyte and Niklas
Nielsen and Patrick R Lawler and Mireille E Schnitzer},
title = {Assumption-Lean Differential Variance Inference for Heterogeneous
Treatment Effect Detection},
year = {In preparation}
}Licence
© 2025 Philippe Boileau
The contents of this repository are distributed under the MIT license. See file LICENSE.md for details.