Multilevel Models
# A tibble: 10 × 5
game_id week season_type home_team away_team
<chr> <int> <chr> <chr> <chr>
1 2024_08_NYG_PIT 8 REG PIT NYG
2 2024_08_MIN_LA 8 REG LA MIN
3 2024_15_NE_ARI 15 REG ARI NE
4 2024_10_CIN_BAL 10 REG BAL CIN
5 2024_01_GB_PHI 1 REG PHI GB
6 2024_08_KC_LV 8 REG LV KC
7 2024_03_SF_LA 3 REG LA SF
8 2024_18_NYG_PHI 18 REG PHI NYG
9 2024_11_BAL_PIT 11 REG PIT BAL
10 2024_09_NO_CAR 9 REG CAR NO
# A tibble: 10 × 7
time posteam_score defteam_score side_of_field yardline_100 down ydstogo
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 10:23 0 0 NYG 13 4 11
2 02:40 28 20 LA 53 2 10
3 12:24 0 0 ARI 33 3 5
4 12:58 0 0 CIN 56 3 9
5 05:34 13 12 GB 15 NA 0
6 02:00 NA NA <NA> NA NA 0
7 15:00 7 14 LA 70 1 10
8 00:00 7 0 <NA> NA NA 0
9 13:27 9 7 PIT 58 2 7
10 04:13 12 7 CAR 15 NA 0
# A tibble: 10 × 2
play_type desc
<chr> <chr>
1 field_goal (10:23) 9-C.Boswell 31 yard field goal is GOOD, Center-46-C.Kunt…
2 run (2:40) 17-P.Nacua right end to LA 45 for -2 yards (7-B.Murphy, 4…
3 no_play (12:24) (Shotgun) 4-A.Gibson left tackle pushed ob at ARI 18 for…
4 pass (12:58) (Shotgun) 9-J.Burrow pass deep middle to 1-J.Chase to BA…
5 extra_point 4-J.Elliott extra point is GOOD, Center-49-R.Lovato, Holder-10-B…
6 no_play Timeout #2 by LV at 02:00.
7 pass (15:00) (Shotgun) 9-M.Stafford pass short middle to 88-J.Whittin…
8 <NA> END QUARTER 1
9 pass (13:27) (No Huddle) 3-R.Wilson pass short middle to 30-J.Warren …
10 extra_point 19-B.Grupe extra point is GOOD, Center-49-Z.Wood, Holder-43-M.Ha…
fumble, complete_pass, passing_yardsgsis_id)Goal: estimate avg. number of points eventually scored by teams from similar situation
EPA: diff. in post- and pre-play EP
nflfastR’s EP model to predict next scoring event in half
Vector of next score probabilities given play features \(\boldsymbol{\mathbf{z}}\): \(\boldsymbol{\pi}(\boldsymbol{\mathbf{z}}) = (\pi_{\textrm{TD}}(\boldsymbol{\mathbf{z}}), \ldots, \pi_{\textrm{oppFG}}(\boldsymbol{\mathbf{z}}))\)
Estimated w/ regression tree ensemble using XGBoost
Definition: Expected Points
Given a game state feature vector \(\boldsymbol{\mathbf{z}}\) and vector of drive outcome probabilities \(\boldsymbol{\pi}(\boldsymbol{\mathbf{z}}),\) the expected points \(\textrm{EP}(\boldsymbol{\mathbf{z}})\) is \[ \begin{align} \textrm{EP}(\boldsymbol{\mathbf{z}}) &= 7\times\pi_{\textrm{TD}}(\boldsymbol{\mathbf{z}}) + 3\times\pi_{\textrm{FG}}(\boldsymbol{\mathbf{z}}) + 2\times\pi_{\textrm{SAF}}(\boldsymbol{\mathbf{z}}) \\ ~&~~-2\times\pi_{\textrm{oSAF}}(\boldsymbol{\mathbf{z}}) - 3\times\pi_{\textrm{oFG}}(\boldsymbol{\mathbf{z}}) - 7\times\pi_{\textrm{oTD}}(\boldsymbol{\mathbf{z}}) \end{align} \]
ep and epa: starting EP and EP added during playoi_colors <-
palette.colors(palette = "Okabe-Ito")
pbp2024 |>
dplyr::group_by(posteam) |>
dplyr::summarize(epa = mean(epa, na.rm = TRUE)) |>
dplyr::arrange(desc(epa)) |>
dplyr::slice(c(1:5, (dplyr::n()-4):(dplyr::n()))) # A tibble: 10 × 2
posteam epa
<chr> <dbl>
1 BAL 0.143
2 BUF 0.141
3 DET 0.135
4 WAS 0.115
5 TB 0.103
6 NE -0.0643
7 NYG -0.0829
8 TEN -0.0896
9 LV -0.107
10 CLE -0.151
mean(pbp2024$epa) seems reasonablepasser_player_id & compute mean(epa) for each playerepa on categorical passer_player_idpasser_player_id to a factor() full_name ols
1 AJ Cole 4.333918
2 Courtland Sutton 3.745369
3 Jack Fox 3.637429
4 Justin Jefferson 3.050907
5 Stefon Diggs 2.762919
6 Miles Killebrew -2.677720
7 J.K. Scott -2.706606
8 Bryan Anger -2.843511
9 Johnny Hekker -2.856017
10 Keenan Allen -5.911163
Level 1: observed EPA for player \(i\) normally distributed around \(\alpha_{i}\)
Level 2: Latent player abilities \(\alpha_{i}\)’s are themselves normally distributed around \(\mu\) \[ \begin{align} \textrm{Level 1}&: &\quad Y_{ij} &= \alpha_{i} + \epsilon_{ij}; \epsilon_{ij} \sim N(0, \sigma^{2}) \quad \textrm{for all}\ j = 1, \ldots, n_{i},\ i = 1, \ldots, I \\ \textrm{Level 2}&: &\quad \alpha_{i} &= \alpha_{0} + u_{i}; u_{i} \sim N(0, \sigma^{2}_{\alpha}) \quad \textrm{for all}\ i = 1, \ldots, I \end{align} \]
\(\alpha_{0}\): average per-pass EPA over super-population of passers
\(\sigma\) captures “within-player” variability in EPA pass-to-pass
\(\sigma_{\alpha}\) captures “between-player” variability in per-pass EPA
(1 | passer_player_id) tells lmer() to include a random intercept for each passerranef() to extract estimates \(\hat{u}_{i}\)’scoef to get \(\hat{\alpha}_{i} = \hat{\alpha}_{0} + \hat{u}_{i}\)
coef() and ranef() return lists w/ one element per grouping variable
lmer_alpha with player name, id, and estimatelmer_alpha to alphas (so we can compare with OLS estimates) full_name lmer n
1 Lamar Jackson 0.36825428 469
2 Jared Goff 0.33315635 536
3 Josh Allen 0.27342658 482
4 Joe Burrow 0.27172057 651
5 Baker Mayfield 0.27062324 569
6 Andy Dalton 0.01971509 160
7 Drew Lock 0.01595111 180
8 Anthony Richardson 0.01197563 261
9 Spencer Rattler -0.04081740 227
10 Dorian Thompson-Robinson -0.13715503 118
pass2024 <-
pbp2024 |>
dplyr::filter(play_type == "pass" & season_type == "REG") |>
dplyr::filter(!grepl("TWO-POINT CONVERSION ATTEMPT", desc) &
!grepl("sacked", desc)) |>
dplyr::select(epa, passer_player_id,
air_yards,
posteam_type, shotgun, no_huddle, qb_hit,
pass_location,
desc) |>
dplyr::mutate(posteam_type = factor(posteam_type),
pass_location = factor(pass_location))air_yards, shotgun, qb_hit, pass_location, posteam_typeLinear mixed model fit by REML ['lmerMod']
Formula: epa ~ 1 + (1 | passer_player_id) + air_yards + posteam_type +
shotgun + no_huddle + qb_hit + pass_location
Data: pass2024
REML criterion at convergence: 65016.9
Scaled residuals:
Min 1Q Median 3Q Max
-8.5225 -0.5478 -0.0841 0.5538 5.4631
Random effects:
Groups Name Variance Std.Dev.
passer_player_id (Intercept) 0.01344 0.1159
Residual 2.28043 1.5101
Number of obs: 17727, groups: passer_player_id, 103
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.133559 0.039314 3.397
air_yards 0.016728 0.001132 14.772
posteam_typehome -0.010065 0.022883 -0.440
shotgun -0.109698 0.032192 -3.408
no_huddle 0.021179 0.034703 0.610
qb_hit -0.470851 0.039760 -11.842
pass_locationmiddle 0.135755 0.030724 4.418
pass_locationright -0.053875 0.025769 -2.091
Correlation of Fixed Effects:
(Intr) ar_yrd pstm_t shotgn n_hddl qb_hit pss_lctnm
air_yards -0.207
postm_typhm -0.280 0.012
shotgun -0.676 -0.005 -0.011
no_huddle -0.039 -0.009 -0.024 -0.105
qb_hit -0.078 -0.067 0.008 0.005 0.011
pss_lctnmdd -0.240 -0.060 -0.013 -0.041 -0.002 -0.019
pss_lctnrgh -0.348 0.007 -0.010 0.010 -0.003 -0.019 0.441
full_name lmer
1 Lamar Jackson 0.34106738
2 Jared Goff 0.32096595
3 Joe Burrow 0.29762396
4 Tua Tagovailoa 0.26988653
5 Josh Allen 0.26577678
6 Bailey Zappe 0.03321274
7 Andy Dalton 0.02969282
8 Anthony Richardson -0.01605454
9 Spencer Rattler -0.03511495
10 Dorian Thompson-Robinson -0.08925603
EPA is based on starting & ending game state
2 phases of passing play: ball in air + after the catch
Currently analysis implicitly credits QBs for both phases
Lecture 10: divide total EPA among relevant offensive players
Develop a version of WAR