Lecture 16: Plackett-Luce Models

Overview

The NFL Draft is an annual meeting in which American football teams select newly eligible players. Prior to the NFL Draft, which is typically held in April, many fans and media analysts create mock drafts, in which they attempt to predict the order of players selected in the actual draft1. While they initially served primarily as entertainment in the off-season, there is now a whole cottage industry of draft experts and analysts who regularly publish mock drafts in the lead up to the NFL Draft. These mock drafts can be used by fans to help calibrate expectations. They also are increasingly used by teams, who attempt to lean on the “wisdom of the crowd” to glean insight about the relative rankings of players and other teams’ preferences.

All this to say: mock drafts contain a considerable amount of information, especially when it comes to comparing players. Unlike game results data, in which we observe a winner and loser, mock draft data typically consist of a list of players ranked in some order. In this lecture, we will introduce the Plackett-Luce model , an extension of the Bradley-Terry models we introduced in Lecture 12, that allows us to derive a single, global ranking of players, based on multiple observed partial rankings. After formally introducing the model (Section 2) and discussing how to fit the model in R (Section 2.3), we will walk through a simple example with only 3 items (Section 2.2). Then, we fit a Plackett-Luce model to mock drafts of the first three rounds of the NFL draft (Section 3) before building a simulation of the first round (Section 4). Using this simulation, we compute the probability that certain prospects being available at certain picks.

The Plackett-Luce Model

Setup

Suppose that \(n\) raters2 (numbered \(i = 1, \ldots, n\)) rank \(p\) total items3 (numbered \(j = 1, 2, \ldots, p\)). Each rater \(i\) reports a rank ordering of their \(n_{i}\) items. For each rater \(i,\) let \(i(1), i(2), \ldots, i(n_{i})\) be the indices of the items that rater \(i.\)4 Our observations take the form \[ i(1) \succ i(2) \succ \cdots \succ i(n_{i}), \] which we read to mean that rater \(i\)’s top-rated item was \(i(1),\) followed by \(i(2),\) then \(i(3),\) etc. In other words, although we will not assume that each rater fully ranks all \(p\) items, we will assume that they provide their top \(n_{i}\) rankings

The Plackett-Luce model assumes that each rater sequentially generates their rankings. First, they select their top-rated item \(i(1)\) out of the set of all items \(\{1, 2, \ldots, p\}.\) Then, conditionally on having selected the first item \(i(1),\) they select their second-ranked item \(i(2)\) from the set of remaining items \(\{1, 2, \ldots, p\} \setminus \{i(1)\}\)5. More generally, after selecting the top \(k\) items \(i(1), \ldots, i(k)\), rater \(i\) selects their \((k+1)\)-st ranked item from the set of currently available items \(\{1, 2, \ldots, p\} \setminus \{i(1), \ldots, i(k)\}.\)

The Plackett-Luce model further assumes that each item \(j\) has a latent “desirability” \(\lambda_{j}\) and that the probability of selecting item \(j\) from a subset \(\mathcal{S} \subset \{1, 2, \ldots, p\}\) \[ \mathbb{P}(\textrm{select } j \textrm{ from } \mathcal{S}) = \begin{cases} \frac{e^{\lambda_{j}}}{\sum_{s \in \mathcal{S}}{e^{\lambda_{s}}}} & \textrm{if } j \in \mathcal{S} \\ 0 & \textrm{otherwise}. \end{cases} \] That is, the log of the probability that the rater selects an item out of \(\mathcal{S}\) is just the item’s latent desirability.

The Bradley-Terry model is just the specialization of the Plackett-Luce model to the setting with \(p = 2\) items.

Note: Identifiability

Like the Bradley-Terry model the latent item parameters \(\lambda_{j}\) are not identifiable: translating all of them by an additive constant (e.g., \(\lambda_{j} \rightarrow \lambda_{j} + 5\)) does not change the underlying probabilities of each observed ranking. By convention, one \(\lambda_{j}\) is set equal to 0 before the modeling fitting. So, all remaining parameter estimates should be interpretted as the relative desirability of the remaining items.

Example: Plackett-Luce with 3 items

As a concrete example, suppose we have three items \(A\), \(B\), and \(C\) and two raters. Further suppose that Rater 1 ranks \(A \succ B \succ C\) and Rater 2 only returns their top-ranked item, which is \(C\). According to the Plackett-Luce model, the probability of observing Rater 1’s ranking \(A \succ B \succ C\) can be decomposed as \[ \begin{align} \mathbb{P}(A \succ B \succ C) &= \mathbb{P}(\textrm{select } A \textrm{ from } \{A, B, C\}) \\ ~ &~\times \mathbb{P}(\textrm{select } B \textrm{ from } \{B, C\}) \\ ~ & ~ \times \mathbb{P}(\textrm{select } C \textrm{ from } \{C\}) \\ ~ & = \frac{e^{\lambda_{A}}}{e^{\lambda_{A}} + e^{\lambda_{B}} + e^{\lambda_{C}}} \times \frac{e^{\lambda_{B}}}{e^{\lambda_{B}} + e^{\lambda_{C}}} \times \frac{e^{\lambda_{C}}}{e^{\lambda_{C}}} \end{align} \] Similarly, the probability of observing Rater 2’s ranking is \[ \mathbb{P}(C) = \frac{e^{\lambda_{C}}}{e^{\lambda_{A}} + e^{\lambda_{B}} + e^{\lambda_{C}}} \]

Fitting Plackett-Luce Models

Of course, in reality, we never observe the latent item parameters \(\lambda_{1}, \ldots, \lambda_{p}.\) To estimate these parameters from collections of partial rankings, we will use the Plackett-Luce package, which can be installed using the code

install.packages("PlackettLuce")

The package includes a function PlackettLuce() for fitting the model. This function requires the data to be formatted in a very specific way that is, unfortunately, not very easy to visualize with lots of raters and items. The package authors have provided a helper function as.rankings() that takes in a matrix of ratings and re-formats it in a way that can be passed to PlackettLuce().

To illustrate, suppose we have four items \(A\), \(B,\), \(C\), and \(D\) and six raters who provide the rankings

  • Rater 1: \(A \succ B \succ D \succ C\)
  • Rater 2: \(B \succ A\)
  • Rater 3: \(A \succ D \succ C\)
  • Rater 4: \(B \succ D \succ C\)
  • Rater 5: \(A \succ B \succ D \succ C\)
  • Rater 6: \(A \succ D \succ C\)

We can encode these ratings in a matrix \(\boldsymbol{\mathbf{R}} = (r_{i,j})\) whose rows correspond to the raters and whose columns correspond to the items. If rater \(i\) selects item \(j\) as their \(k\)-th ranked item, then we set \(r_{i,j} = k\). If rater \(i\) does not rank item \(j\) at all, we leave \(r_{i,j}\) empty (i.e., assign it NA in R)

R <- matrix(
  c(1,2,4,3,
    2, 1, NA, NA,
    1, NA, 3, 2,
    NA, 1, 3, 2,
    1, 2, 4, 3,
    1, NA, 3, 2),
  nrow = 6, ncol = 4, dimnames = list(c(), c("A", "B", "C", "D")),
  byrow = TRUE)
R
      A  B  C  D
[1,]  1  2  4  3
[2,]  2  1 NA NA
[3,]  1 NA  3  2
[4,] NA  1  3  2
[5,]  1  2  4  3
[6,]  1 NA  3  2

The function as.rankings() converts the matrix of rankings into a vector of partial rankings whose elements encode each rater’s rankings.

R_rank <- PlackettLuce::as.rankings(R)
R_rank
[1] "A > B > D > C" "B > A"         "A > D > C"     "B > D > C"    
[5] "A > B > D > C" "A > D > C"    

Once we have our rankings data in a format compatible with PlackettLuce(), we can estimate the underlying \(\lambda_{j}\)’s We see that item \(A\), which was the top ranked item by the first rater, is selected as the reference item (so \(\lambda_{A} = 0\)).

fit <- PlackettLuce::PlackettLuce(rankings = R_rank)
lambda_hat <- coef(fit)
lambda_hat
         A          B          C          D 
 0.0000000 -0.5006691 -4.9505260 -2.5287292 

Consensus Draft Rankings

Since 2018, Benjamin Robinson has built predictive model of the actual NFL draft by combining mock drafts from multiple sources. Using these models, he can forecast each propspect’s potential draft position and also estimate the inherent value each draft pick6 As part of the 2020 Carnegie Mellon Sports Analytics Conference’s Reproducible Research competition, Benjamin released some of the mock draft data he collected in the lead-up to the 2018 NFL Draft. In this section, we will work with a subset of the mock draft data he collected in the lead up to the 2018 NFL Draft. You can download a CSV file containing the data we will analyze here; it contains the results of the first three rounds of several mock drafts7. The data contains data from 372 distinct mock drafts, each of which ranked up to 96 players (corresponding to the first three rounds).

Data Preparation

We begin by reading in our data and extracting a list of the unique players and the positions they play.

raw_data <- readr::read_csv(file = "three_round_mocks.csv")

unik_players <- 
  raw_data |>
  dplyr::select(name, position) |>
  unique()

Here are five random rows in raw_data

set.seed(129)
raw_data |> dplyr::slice_sample(n = 5)
# A tibble: 5 × 7
  site                         date       name         pick position type  url  
  <chr>                        <date>     <chr>       <dbl> <chr>    <chr> <chr>
1 Pacific Takes - Follman      2018-03-19 R.J. McInt…    55 DT       Fan   http…
2 Walter Football - Alex9299   2018-02-14 Sam Darnold     1 QB       Fan   http…
3 Walter Football - Lucasmd3   2018-03-30 Mike Hughes    41 CB       Fan   http…
4 Baltimore Beatdown - Lericos 2018-02-13 Maurice Hu…    18 DT       Fan   http…
5 Fox 59 - Joe Hopkins         2018-02-26 Josh Jacks…    14 CB       Media http…

Each row of raw_data corresponds a specific pick within a given mock draft. The mock drafts are uniquely identified by the variables site, url8, and date9

Fitting the model

Currently, rows in raw_data correspond to the combination of row and pick. We need to convert this into a ranking matrix, in which rows correspond to raters, columns correspond to the items, and values record the ranking. In other words, we need to convert raw_data from long format into wide format10. We can do this pretty easily with the function tidyr::pivot_wider()

ranking_matrix <-
  raw_data |>
  dplyr::select(site, date, name, pick, url) |>
  tidyr::pivot_wider(names_from = name,values_from = pick) |>
  dplyr::select(-tidyr::all_of(c("date", "url", "site"))) |>
  dplyr::mutate_all( ~replace(., lengths(.)==0, NA)) |>
  as.matrix()
1
Extract only variables needed to uniquely identify the mock draft (site, date, and url), the player/prospec (name), and the pick with which they were selected (pick)
2
Convert from long to wide format
3
Drop the columns identifying the raters (i.e., mock drafts) as they are now redundant
4
Fill in missing values with NA’s
5
Convert the data table into a matrix

We now fit the model and extract the estimates \(\hat{\lambda}_{j}\)’s

mock_rankings <- PlackettLuce::as.rankings(x = ranking_matrix)
fit <- PlackettLuce::PlackettLuce(rankings = mock_rankings)
lambda_hat <- coef(fit)

Based on the 372 three-round mock drafts, the model estimates Sam Darnold as having the largest latent desirability, closely followed by Saquon Barkley, Josh Rosen, Bradley Chubb, and Quenton Nelson.

round(sort(lambda_hat, decreasing = TRUE), digits = 3)[1:10]
       Sam Darnold     Saquon Barkley         Josh Rosen      Bradley Chubb 
             0.000             -0.561             -0.997             -1.455 
    Quenton Nelson Minkah Fitzpatrick     Baker Mayfield   Josh Allen (WYO) 
            -1.963             -1.963             -2.277             -2.440 
      Roquan Smith       Derwin James 
            -2.627             -2.986 

Draft Simulation

We’re now in a position to simulate a draft. In our simulation, we will assume that teams select players in a manner consistent with the underlying Plackett-Luce model. That is, the first team will select a player out of the set of all players according to the model’s implied probabilities. Then, the second team will select a player out of the set of all available players.

Assumption Warning

This is a heroic assumption: in actuality, teams make selections based on a range of factors that are not (yet) captured in our Plackett-Luce model. As but one example, each team typically creates their own internal rankings of players and different teams have different rankings not only of players but also of positions. For simplicity, our simulation assumes that teams are working based on the same rankings estimated with our model. To build a more accurate or life-like draft simulator, we would need to model team and player specific rankings, which is a bit beyond the scope of the course.

To illustrate these steps, let’s first simulate the first pick of the draft. According to the Plackett-Luce model, the probability that player \(j\) is selected first is \[ \frac{e^{\lambda_{j}}}{\sum_{k = 1}^{p}{e^{\lambda_{k}}}}. \] The following code computes the probability that each available player is selected first overall.

players <- names(lambda_hat)
probs <- exp(lambda_hat)/sum(exp(lambda_hat))
round(sort(probs, decreasing = TRUE), digits = 2)[1:10]
       Sam Darnold     Saquon Barkley         Josh Rosen      Bradley Chubb 
              0.26               0.15               0.09               0.06 
    Quenton Nelson Minkah Fitzpatrick     Baker Mayfield   Josh Allen (WYO) 
              0.04               0.04               0.03               0.02 
      Roquan Smith       Derwin James 
              0.02               0.01 

We can use the sample() function to select a single player according to their corresponding probabilities. At least for this simulation, Saquon Barkley is the first player selected

set.seed(123)
sample(x = players, size = 1, prob = probs)
[1] "Saquon Barkley"

According to the Plackett-Luce model, the probability selecting player \(j\) is with the second pick after Saquon Barkley has already been selected is \[ \frac{e^{\lambda_{j}}}{\sum_{k = 1}^{p}{e^{\lambda_{k}} \times \mathbf{1}(\textrm{player } k \textrm{ is not Saquon Barkley})}}. \] Mathematically, this probability is rather cumbersome to write down. Luckily, it’s easier to compute in R using some clever indexing. In the code below, we create separate vectors containing the set of previously selected players and players still available to select; exponentiate the \(\hat{\lambda}_{j}\)’s for the available players; and then normalize these values to sum to 1 across the available players

selected_players <- c("Saquon Barkley")
available_players <- players[!players %in% selected_players]
probs <- exp(lambda_hat[available_players])/sum(exp(lambda_hat[available_players]))
sample(x = available_players, size = 1, prob = probs)
1
Define vectors of players already selected and players available to be selected
2
Compute the probabilities of selection among the available players only
[1] "Parris Campbell"

At least in this particular simulation, the second player selected is Parris Campbell (who was actually selected in the second round of the NFL draft). According to the model, based on all the other players available, the chances that Campbell is selected were quite slim:

probs["Parris Campbell"]
Parris Campbell 
    0.004122244 

We can now simulate the third pick using virtually identical code as the second pick.

selected_players <- c("Saquon Barkley", "Parris Campbell") 
available_players <- players[!players %in% selected_players] 
probs <- exp(lambda_hat[available_players])/sum(exp(lambda_hat[available_players])) 
sample(x = available_players, size = 1, prob = probs)
[1] "Josh Rosen"

Digression: The Softmax Function

Continuing this simulation for several more picks is conceptually straightforward: as the picks progress, we need to

  1. Update the running set of players who have already been selected (i.e., update selected_players).
  2. Compute \(e^{\lambda_{j}}\) for all the players who remain available.
  3. Normalize these exponentiated values across available players to obtain selection probabilities.
  4. Draw a player from the set of available players based on the selection probabilities.

Steps 2 and 3 describe how to compute the selection probabilities from the Plackett-Luce parameters of the remaining players. The mapping from the vector of these parameters into the selection probabilities is known as the softmax function

Definition: Softmax

The softmax function \(\sigma\) maps a vector of \(n\) numbers \((x_{1}, \ldots, x_{n})\) into a vector of probabilities \[ \sigma(x_{1}, \ldots, x_{n}) = \left( \frac{e^{x_{1}}}{\sum{e^{x_{i}}}}, \ldots, \frac{e^{x_{n}}}{\sum{e^{x_{i}}}}\right) \]

In R, a very fast and numerically stable implemention of the softmax function is available from the mclust package, which can be installed using

install.packages("mclust")

The function mclust::softmax() is relatively simple to use. The following code compares the third pick probabilities computed using mclust::softmax() to our hand-computed calculations. Although the differences in this case are essentially negligble, moving forward we will use the somewhat more stable softmax implementation.

tmp_probs <- mclust::softmax(x = lambda_hat[available_players]) 
names(tmp_probs) <- available_players
range(tmp_probs - probs)
1
The one downside of mclust::softmax() is that it doesn’t preserve element names. So, we have to re-set the names manually
2
Compare the probabilities computed by hand (probs) and using mclust::softmax().
[1] -1.387779e-17  6.938894e-18

First Round Simulation

We’re now in a position to simulate the entire first round of the NFL draft many times. We can then estimate probabilities of various events (e.g., Saquon being selected in the top-3) by looking at the proportion of simulated drafts in which the events occur.

Like our Markov chain simulations from Lecture 14, we will use double loop for implement our simulation. The outer loop iterates over the various simulation replications while the inner loop iterates over the individual picks in each simulated draft. Before starting each simulation replication, we create a container selected_players to record the picked players in the order in which they were picked. In the inner loop, we update the set of available players, compute the selection probabilities, and then simulate the actual pick. After the inner loop concludes, we write selected_players into the appropriate row of a large array holding our simulated picks. In this array, rows correspond to simulation iterations and columns correspond to picks.

Warning

The following code takes a few minutes to run

n_sims <- 1e5
results <- array(dim = c(n_sims, 32))

for(r in 1:n_sims){
  set.seed(479 + 3*r)
  
  selected_players <- rep(NA, times = 32)
  available_players <- players
  
  for(i in 1:32){
    available_players <- players[!players %in% selected_players]
    probs <- mclust::softmax(x = lambda_hat[available_players])
    names(probs) <- available_players
    pick <- sample(x = available_players, size = 1, prob = probs)
    selected_players[i] <- pick
  }
  results[r,] <- selected_players
}

Taking a look at the top-5 picks, we see a fair bit of variability simulation-to-simulation

results[1:5, 1:5]
     [,1]               [,2]                 [,3]                
[1,] "Saquon Barkley"   "Vita Vea"           "Sam Darnold"       
[2,] "Sam Darnold"      "Minkah Fitzpatrick" "Saquon Barkley"    
[3,] "Josh Rosen"       "Dallas Goedert"     "Josh Allen (UK)"   
[4,] "Tremaine Edmunds" "Harold Landry"      "Quenton Nelson"    
[5,] "Sam Darnold"      "Saquon Barkley"     "Minkah Fitzpatrick"
     [,4]              [,5]            
[1,] "Joseph Noteboom" "Trey Adams"    
[2,] "Derwin James"    "Quenton Nelson"
[3,] "Andre Dillard"   "Da'Ron Payne"  
[4,] "Saquon Barkley"  "Da'Ron Payne"  
[5,] "Josh Rosen"      "Baker Mayfield"

To compute the probability that Saquon Barkley is selected in the first three picks, we first define a function that checks whether a given players name appears among the first several entries of a given vector:

player_picked <- function(x, player, pick){
  return(player %in% x[1:pick])
}
player_picked(x = results[1,], player = "Saquon Barkley", pick = 3)
1
Checks if player appears in the first pick elements of x
[1] TRUE

We can then apply this function to every row of results to get a single Boolean (i.e., TRUE or FALSE) recording whether Saquon was selected in the first three picks. To quickly compute the proportion of simulated drafts in which Saquon was picked within the top 3 picks, we can take the mean of these Booleans. We find that in 41% of our simulated drafts, Saquon is a top-3 pick Note that when we apply a function to an array, additional arguments for the function can be passed to apply().

saquon_pick3 <- apply(X = results, MARGIN = 1, FUN = player_picked, player = "Saquon Barkley", pick = 3)
mean(saquon_pick3)
[1] 0.41382

Going a bit further, we can even compute quantities like the probability Saquon Barkley is available at pick 11 given that he was not picked in the top-3. We can estimate this probability with \(N/D\) where

  • \(N\) is the number of simulated drafts in which Saquon is not picked within the first 10 picks
  • \(D\) is the number of simulated drafts in which Saquon is not picked within the first 3 picks
saquon_pick10 <- apply(X = results, MARGIN = 1, FUN = player_picked, player = "Saquon Barkley", pick = 10)
N <- sum(!saquon_pick10)
D <- sum(!saquon_pick3)

cat("N = ", N, " D = ", D, "\n")
N =  9666  D =  58618 
cat("P(Saquon available at pick 11 | Saquon not taken in top-3) = ", round(100*N/D, digits = 3), "%\n")
P(Saquon available at pick 11 | Saquon not taken in top-3) =  16.49 %

Cautionary Note

The results of the simulated drafts are quite a different than the actual 2018 NFL Draft. The discrepancy largely stems from the fact that the probabilistic model underpinning our simulations is rather unrealistic: it assumes, for instance, that each team rates each player identically. It also does not account for differential preferences regarding position. To make the simulation more realistic, you could consider incorporating player-specific covariates where each \(\lambda_{j}\) is a linear function of covariates: \[ \lambda_{j} = \beta_{0} + \beta_{1}x_{j,1} + \cdots + \beta_{p}x_{j,p}. \] It is possible to use the PlackettLuce package to fit such a model with item-level covariates. See this package vignette for details and an example. For the football data, it may be useful to treat position (e.g., quarterback, offensive tackle, etc) as a covariate.

Footnotes

  1. You can read more about the history of mock drafts here.↩︎

  2. In the context of NFL mock drafts, the raters are the individual mock drafts.↩︎

  3. In the context of NFL mock drafts, the items are players/prospects.↩︎

  4. In the context of NFL mock drafts, \(i(k)\) is the player/prospect selected by rater \(i\) with pick \(k.\)↩︎

  5. For two discrete sets \(A\) and \(B,\) \(A \setminus B\) consists of all elements of \(A\) that are not in \(B.\) For instance, if \(A = \{1, 2, 3, 4\}\) and \(B = \{2,4\},\) then \(A \setminus B = \{1,3\}.\)↩︎

  6. You can read more about Benjamin’s work on his Substack.↩︎

  7. You can download the full data set, from which ours was derived, here.↩︎

  8. The website from which Benjamin Robinson obtained the mock draft results.↩︎

  9. The date the mock draft was published↩︎

  10. See these notes for nice overview of long vs wide formats↩︎