cvCovEst() identifies the optimal covariance matrix
estimator from among a set of candidate estimators.
A numeric data.frame, matrix, or similar object.
A list of estimator functions to be considered in
the cross-validated estimator selection procedure.
A named list of arguments corresponding to
the hyperparameters of covariance matrix estimators in estimators.
The name of each list element should match the name of an estimator passed
to estimators. Each element of the estimator_params is itself
a named list, with the names corresponding to a given estimator's
hyperparameter(s). The hyperparameter(s) may be in the form of a single
numeric or a numeric vector. If no hyperparameter is needed
for a given estimator, then the estimator need not be listed.
A function indicating the loss function to be used.
This defaults to the Frobenius loss, cvMatrixFrobeniusLoss().
An observation-based version, cvFrobeniusLoss(), is also made
available. Additionally, the cvScaledMatrixFrobeniusLoss() is
included for situations in which dat's variables are of different
scales.
A character indicating the cross-validation scheme
to be employed. There are two options: (1) V-fold cross-validation, via
"v_folds"; and (2) Monte Carlo cross-validation, via "mc".
Defaults to Monte Carlo cross-validation.
A numeric between 0 and 1 indicating the proportion
of observations to be included in the validation set of each Monte Carlo
cross-validation fold.
An integer larger than or equal to 1 indicating the
number of folds to use for cross-validation. The default is 10, regardless
of the choice of cross-validation scheme.
A logical option indicating whether to run the main
cross-validation loop with future_lapply(). This
is passed directly to cross_validate().
Not currently used. Permits backward compatibility.
A list of results containing the following elements:
estimate - A matrix corresponding to the estimate of
the optimal covariance matrix estimator.
estimator - A character indicating the optimal
estimator and corresponding hyperparameters, if any.
risk_df - A tibble providing the
cross-validated risk estimates of each estimator.
cv_df - A tibble providing each
estimators' loss over the folds of the cross-validated procedure.
args - A named list containing arguments passed to
cvCovEst.
cvCovEst(
dat = mtcars,
estimators = c(
linearShrinkLWEst, thresholdingEst, sampleCovEst
),
estimator_params = list(
thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
)
)
#> $estimate
#> mpg cyl disp hp drat wt
#> mpg 36.324103 -9.1723790 -633.09721 -320.732056 2.19506351 -5.1166847
#> cyl -9.172379 3.1895161 199.66028 101.931452 -0.66836694 1.3673710
#> disp -633.097208 199.6602823 15360.79983 6721.158669 -47.06401915 107.6842040
#> hp -320.732056 101.9314516 6721.15867 4700.866935 -16.45110887 44.1926613
#> drat 2.195064 -0.6683669 -47.06402 -16.451109 0.28588135 -0.3727207
#> wt -5.116685 1.3673710 107.68420 44.192661 -0.37272073 0.9573790
#> qsec 4.509149 -1.8868548 -96.05168 -86.770081 0.08714073 -0.3054816
#> vs 2.017137 -0.7298387 -44.37762 -24.987903 0.11864919 -0.2736613
#> am 1.803931 -0.4657258 -36.56401 -8.320565 0.19015121 -0.3381048
#> gear 2.135685 -0.6491935 -50.80262 -6.358871 0.27598790 -0.4210806
#> carb -5.363105 1.5201613 79.06875 83.036290 -0.07840726 0.6757903
#> qsec vs am gear carb
#> mpg 4.50914919 2.01713710 1.80393145 2.1356855 -5.36310484
#> cyl -1.88685484 -0.72983871 -0.46572581 -0.6491935 1.52016129
#> disp -96.05168145 -44.37762097 -36.56401210 -50.8026210 79.06875000
#> hp -86.77008065 -24.98790323 -8.32056452 -6.3588710 83.03629032
#> drat 0.08714073 0.11864919 0.19015121 0.2759879 -0.07840726
#> wt -0.30548161 -0.27366129 -0.33810484 -0.4210806 0.67579032
#> qsec 3.19316613 0.67056452 -0.20495968 -0.2804032 -1.89411290
#> vs 0.67056452 0.25403226 0.04233871 0.0766129 -0.46370968
#> am -0.20495968 0.04233871 0.24899194 0.2923387 0.04637097
#> gear -0.28040323 0.07661290 0.29233871 0.5443548 0.32661290
#> carb -1.89411290 -0.46370968 0.04637097 0.3266129 2.60887097
#>
#> $estimator
#> [1] "sampleCovEst, hyperparameters = NA"
#>
#> $risk_df
#> # A tibble: 5 × 3
#> estimator hyperparameters cv_risk
#> <chr> <chr> <dbl>
#> 1 sampleCovEst hyperparameters = NA 252072263.
#> 2 thresholdingEst gamma = 0.1 252072263.
#> 3 thresholdingEst gamma = 0.2 252072263.
#> 4 thresholdingEst gamma = 0.3 252072265.
#> 5 linearShrinkLWEst hyperparameters = NA 255542897.
#>
#> $cv_df
#> # A tibble: 50 × 4
#> estimator hyperparameters loss fold
#> <chr> <chr> <dbl> <int>
#> 1 linearShrinkLWEst hyperparameters = NA 40551934. 1
#> 2 thresholdingEst gamma = 0.1 46314151. 1
#> 3 thresholdingEst gamma = 0.2 46314152. 1
#> 4 thresholdingEst gamma = 0.3 46314154. 1
#> 5 sampleCovEst hyperparameters = NA 46314152. 1
#> 6 linearShrinkLWEst hyperparameters = NA 100116843. 2
#> 7 thresholdingEst gamma = 0.1 90248341. 2
#> 8 thresholdingEst gamma = 0.2 90248341. 2
#> 9 thresholdingEst gamma = 0.3 90248341. 2
#> 10 sampleCovEst hyperparameters = NA 90248341. 2
#> # … with 40 more rows
#>
#> $args
#> $args$cv_loss
#> <quosure>
#> expr: ^cvMatrixFrobeniusLoss
#> env: 0x7fa1851f8128
#>
#> $args$cv_scheme
#> [1] "v_fold"
#>
#> $args$mc_split
#> [1] 0.5
#>
#> $args$v_folds
#> [1] 10
#>
#> $args$parallel
#> [1] FALSE
#>
#>
#> attr(,"class")
#> [1] "cvCovEst"