A nonparametric framework for treatment effect modifier discovery in high dimensions


Heterogeneous treatment effects are driven by treatment effect modifiers, pre-treatment covariates that modify the effect of a treatment on an outcome. Current approaches for uncovering these variables are limited to low-dimensional data, data with weakly correlated covariates, or data generated according to parametric processes. We resolve these issues by developing a framework for defining model-agnostic treatment effect modifier variable importance parameters applicable to high-dimensional data with arbitrary correlation structure, deriving one-step, estimating equation and targeted maximum likelihood estimators of these parameters, and establishing these estimators’ asymptotic properties. This framework is showcased by defining variable importance parameters for data-generating processes with continuous, binary, and time-to-event outcomes with binary treatments, and deriving accompanying multiply-robust and asymptotically linear estimators. Simulation experiments demonstrate that these estimators’ asymptotic guarantees are approximately achieved in realistic sample sizes for observational and randomized studies alike. This framework is applied to gene expression data collected for a clinical trial assessing the effect of a monoclonal antibody therapy on disease-free survival in breast cancer patients. Genes predicted to have the greatest potential for treatment effect modification have previously been linked to breast cancer. An open-source R package implementing this methodology, unihtee, is made available on GitHub.