Causal Machine Learning for Differential Variance
cmldiffvar.Rdcmldiffvar() infers the differential variance between two
treatments using causal machine learning methods.
Usage
cmldiffvar(
data_tbl,
estimand_type = "absolute",
estimator_type = "tmle",
confidence_level = 0.95,
propensity_score_adj_var_names,
cond_exp_outcome_adj_var_names,
treatment_var_name,
propensity_score_var_name = NULL,
outcome_var_name,
propensity_score_library = c("SL.glm", "SL.earth", "SL.ranger"),
cond_exp_outcome_library = c("SL.glm", "SL.earth", "SL.ranger"),
cond_exp_sq_outcome_library = c("SL.glm.interaction", "SL.ranger"),
num_nuisance_sl_folds = 5,
cross_fit = FALSE,
num_cross_fit_folds = 5,
parallel = FALSE
)Arguments
- data_tbl
A
data.frameortibble.- estimand_type
A
characterindicating whether to estimate the absolute ("absolute") or relative ("relative") differential variance.- estimator_type
A
characterindicating whether to use a one-step ("one-step") or a targeted maximum likelihood estimator ("tmle") in the cross-fitting procedure.- confidence_level
A
numericbetween $0.1$ and $0.99$ providing the confidence level used to compute confidence intervals. Defaults to0.95.- propensity_score_adj_var_names
A
charactervector providing the columns names of the adjustment set variables for propensity score estimation stored indata_tbl. Ignored ifpropensity_score_var_nameisNULL.- cond_exp_outcome_adj_var_names
A
charactervector providing the columns names of the adjustment set variables for conditional expected (squared) outcome estimation stored indata_tbl.- treatment_var_name
A
characterproviding the column name of the treatment assignment indicator stored indata_tbl.- propensity_score_var_name
An optional
characterproviding the column name of the treatment assignment indicator stored indata_tbl. Defaults toNULL. See the Details section for more information.- outcome_var_name
A
characterproviding the column name of the outcome variable stored indata_tbl.- propensity_score_library
A
charactervector of candidate learners used by the SuperLearner estimator of the propensity score. Defaults toc("SL.glm", "SL.earth", "SL.ranger").- cond_exp_outcome_library
A
charactervector of candidate learners used by the SuperLearner estimator of the expected outcome conditional on confounders and treatment assignment. Defaults toc("SL.glm", "SL.earth", "SL.ranger").- cond_exp_sq_outcome_library
A
charactervector of candidate learners used by the SuperLearner estimator of the expected squared outcome conditional on confounders and treatment assignment. Defaults toc("SL.glm.interaction", "SL.ranger").- num_nuisance_sl_folds
A
numericindicating the number of folds to use in cross-validated SuperLearner estimators. Defaults to5.- cross_fit
A
logical flagdetermining whether cross-fitted estimators are used. Defaults toTRUE.- num_cross_fit_folds
A
numericsetting the number of folds to use by the cross-fitting procedures, if used. Defaults to5.- parallel
A
logical flagindicating whether the cross-fitting procedure, if used, should be parallelized. Defaults toFALSE.
Value
A one-row tibble containing the following columns:
estimand: The scale of the differential variance estimandestimate: The differential variance estimatese: The estimator's standard errorci_low: The lower bound of the Wald-type confidence intervalci_high: The upper bound of the Wald-type confidence intervalp_value: The p-value of a two-sided test using the z-score of the differential variance estimate
Details
cmldiffvar() assumes that the data in data_tbl are generated
according to a parallel group study design with a binary treatment
variable, a continuous outcome, and continuous or binary treatment-outcome
confounders.
Under standard causal identifiability conditions — namely, consistency,
positivity, and full exchangeability — cmldiffvar() performs inference
on the differential variance of the potential outcomes. Differential
variance is defined on the absolute scale as the difference in potential
outcome standard deviations of two treatments. On the relative scale,
differential variance is defined as the ratio of the potential outcome
variances. The scale of the differential variance estimated by
cmldiffvar() is specified by the estimand_type parameter.
These differential variance estimands rely on three nuisance parameters: the propensity score, the expected outcome conditional on confounders and treatment assignment, and the expected squared outcome conditional on confounders and treatment assignment.
cmldiffvar() implements cross-fitted one-step and targeted maximum
likelihood estimators of the differential variance estimands. These
cross-fitted estimators are used by default; non-cross-fitted estimators
can be used by setting cross_fit = FALSE. The number of folds used in
these cross-fitting procedures is set by num_cross_fit_folds. The
differential variance estimands' nuisance parameters are flexibly estimated
with SuperLearner estimators.
The estimators implemented in cmldiffvar() are consistent if at least
one of the following conditions is satisfied: (1) the propensity score is
consistently estimated, and (2) the expected outcome conditional on
confounders and treatment assignment and the expected squared outcome
conditional on confounders and treatment assignment are consistently
estimated.
The estimators implemented in cmldiffvar() are asymptotically linear —
meaning their asymptotic sampling distribution is normally distributed
about the true differential variance parameter — if all nuisance
parameter estimators are consistently estimated at a rate of
\(o_P(n^{-1/4})\). The confidence intervals, standard errors, and
p-values reported by cmldiffvar() are incorrect if these rate conditions
are not satisfied.
When the data is the product of a randomized study with known propensity
scores, these propensity scores can be provided to the
propensity_score_var_name parameter. The conditions required of the
estimators to be consistent and asymptotic linear are automatically
satisfied when know propensity scores are used.
The cross-fitted estimation procedures can be parallelized by setting
parallel = TRUE. Parallelization relies on the future
package. Instructions for setting up parallel processing are available in
the future package's vignettes.