Regularized Adjusted Plus/Minus
\[ \begin{align} Y_{i} &= \alpha_{0} + \alpha_{h_{1}(i)} + \alpha_{h_{2}(i)} + \alpha_{h_{3}(i)} + \alpha_{h_{4}(i)} + \alpha_{h_{5}(i)} \\ ~&~~~~~~~~~~- \alpha_{a_{1}(i)} - \alpha_{a_{2}(i)} - \alpha_{a_{3}(i)} - \alpha_{a_{4}(i)} - \alpha_{a_{5}(i)} + \epsilon_{i}, \end{align} \]
For all \(\lambda\), minimizer is \(\hat{\boldsymbol{\alpha}}(\lambda) = \left(\boldsymbol{\mathbf{Z}}^{\top}\boldsymbol{\mathbf{Z}} + \lambda I \right)^{-1}\boldsymbol{\mathbf{Z}}^{\top}\boldsymbol{\mathbf{Y}}.\)
This is almost the OLS solution
cv.glmnet(): performs cross-validation
standardize = FALSE id rapm Name
1 1628983 3.602143 Shai Gilgeous-Alexander
2 1627827 3.472249 Dorian Finney-Smith
3 1630596 3.415232 Evan Mobley
4 1629029 3.352736 Luka Doncic
5 202699 3.183716 Tobias Harris
6 203999 2.846327 Nikola Jokic
7 1631128 2.829463 Christian Braun
8 1628384 2.813794 OG Anunoby
9 203507 2.646488 Giannis Antetokounmpo
10 1626157 2.641075 Karl-Anthony Towns
For some \(a \in [0,1]\) glmnet() and cv.glmnet() actually minimize \[
\sum_{i = 1}^{n}{(Y_{i} - \boldsymbol{\mathbf{z}}_{i}^{\top}\boldsymbol{\alpha})^{2}} + \lambda \times \sum_{j = 0}^{p}{\left[a \times \lvert\alpha_{j}\rvert + (1-a) \times \alpha_{j}^{2}\right]},
\]
Value of \(a\) specified with alpha argument.
alpha = 0: penalty is \(\sum_{j}{\alpha_{j}^{2}}\)
alpha = 1: penalty is \(\sum_{j}{\lvert \alpha_{j} \rvert}\)
\(0 <\)alpha\(<1\): Elastic Net regression
Compute \(\hat{\boldsymbol{\alpha}}(\hat{\lambda})\) using original dataset
Draw \(B\) re-samples of size \(n\)
For each re-sampled dataset, compute \(\hat{\boldsymbol{\alpha}}(\hat{\lambda})\)
Save the \(B\) bootstrap estimates of \(\boldsymbol{\alpha}\) in an array
[1] 1 2 3 4 5 6 6 9 10 11 13 14 14 15 16 17 18 18 19 20
Name Orig Bootstrap
1 Shai Gilgeous-Alexander 3.602143 1.9823702
2 Dorian Finney-Smith 3.472249 1.8454962
3 Evan Mobley 3.415232 3.9265593
4 Luka Doncic 3.352736 2.6188868
5 Tobias Harris 3.183716 2.3872891
6 Nikola Jokic 2.846327 3.1117201
7 Christian Braun 2.829463 2.6171955
8 OG Anunoby 2.813794 1.9276271
9 Giannis Antetokounmpo 2.646488 0.8409495
10 Karl-Anthony Towns 2.641075 4.2211787
B <- 500
player_names <- colnames(X_full)
boot_rapm <- matrix(nrow = B, ncol = p, dimnames = list(c(), player_names))
for(b in 1:B){
set.seed(479+b)
boot_index <- sample(1:n, size = n, replace = TRUE)
fit <- glmnet(x = X_full[boot_index,], y = Y[boot_index],
lambda = cv_fit$lambda,
alpha = 0,standardize = FALSE)
tmp_alpha <- fit$beta[,lambda_index]
boot_rapm[b, names(tmp_alpha)] <- tmp_alpha
}doncic_id <- player_table |> dplyr::filter(Name == "Luka Doncic") |> dplyr::pull(id)
davis_id <- player_table |> dplyr::filter(Name == "Anthony Davis") |> dplyr::pull(id)
boot_diff <- boot_rapm[,davis_id] - boot_rapm[,doncic_id]
ci <- quantile(boot_diff, probs = c(0.025, 0.975))
round(ci, digits = 3) 2.5% 97.5%
-6.378 -0.484
rank() returns sample ranks of vector elementsMARGIN = 1 applies function to each rowboot_rank <- apply(-1*boot_rapm, MARGIN = 1, FUN = rank)
sga_id <-
player_table |> dplyr::filter(Name == "Shai Gilgeous-Alexander") |> dplyr::pull(id)
sga_ranks <- boot_rank[sga_id,]
table(sga_ranks)[1:10]sga_ranks
1 2 3 4 5 6 7 8 9 10
58 59 46 55 36 21 19 12 17 17