Lecture 12: Bradley-Terry Models

Motivation: The 2025 NCAA Women’s Ice Hockey Tournament

The Wisconsin Badgers defeated the Ohio State Buckeyes women in the championship game of the 2025 NCAA Women’s Ice Hockey Tournament¹. During the 2024-25 regular season, the Badgers won 38 games, lost 1 game, and tied 2 games. The Buckeyes won 35 games, lost 4 games, and did not tie any games. In this lecture, we’ll consider the broad question “Did the better team win?” and introduce a framework for measuring latent team strength and predicting the outcome of individual matches.

To motivate our developments, let’s consider two slightly different — but certainly related — questions, which are somewhat more nuanced than “did the better team win?”:

If we could repeatedly play the championship game, how often would the Badgers win?
If the NCAA Championship were awarded to the winner in a “best-of-5” series² at a neutral site, what is the likelihood that the Badgers would win the championship?

To answer these questions, we introduce a model for estimating latent team strengths from pairwise comparisons (i.e., match results; Section 2). We build a data set containing the outcomes of every Division 1 NCAA Women’s Ice Hockey game from the 2024-25 season (Section 3). We then estimate the model parameters (Section 4) before using the model estimates to simulate a best-of-5 series (Section 4.3) between the Badgers and Buckeyes.

The Bradley-Terry Model

Suppose we observe the outcome of \(n\) games played between \(p\) teams, which we label \(1, 2, \ldots, p.\) The basic Bradley-Terry model associates each team \(j = 1, \ldots, p,\) with a latent strength \(\lambda_{j}\) and asserts that \[ \mathbb{P}(\textrm{team i beats team j}) = \frac{e^{\lambda_{i}}}{e^{\lambda_{i}} + e^{\lambda_{j}}} = \frac{1}{1 + e^{-1 \times (\lambda_{i} - \lambda_{j})}} \] In other words, under the basic Bradley-Terry model, the log-odds of team \(i\) beating team \(j\) is \(\lambda_{i} - \lambda_{j}.\)

Example: Predicted Probabilities

Consider a round-robin tournament between 4 teams (numbered 1, 2, 3, and 4) in which every teams plays every other team exactly once. Further suppose that the latent team strengths are \(\lambda_{1} = 1, \lambda_{2} = 0.5, \lambda_{3} = 0.3\) and \(\lambda_{4} = 0.\) According to the Bradley-Terry model, if team 1 plays team 2, it is expected to win with probability \[ \mathbb{P}(\textrm{team 1 beats team 2}) = \frac{e^{1}}{e^{1} + e^{0.5}} \approx 62\% \]

lambda <- c(1, 0.5, 0.3, 0)
1/(1 + exp(-1 * (lambda[1] - lambda[2])))

1: It is more numerically stable to compute the difference in strengths on the log-odds scale and apply the inverse logistic transform than to exponentiate the \(\lambda\)’s in the numerator and denominator.

[1] 0.6224593

Manually computing the probability that each team beats every other team is not too difficulty³. To compute all these probabilities in one go, we can make use of the outer() function to assemble a matrix containing all pairwise differences of the \(\lambda_{i}'\).

outer(X = lambda, Y = lambda, FUN = "-")

1: outer() applies the function specified with the FUN argument to every combination of its arguments X and Y.

     [,1] [,2] [,3] [,4]
[1,]  0.0  0.5  0.7  1.0
[2,] -0.5  0.0  0.2  0.5
[3,] -0.7 -0.2  0.0  0.3
[4,] -1.0 -0.5 -0.3  0.0

Once we have these differences, which are on the log-odds scale, we can apply the inverse logistic function to transform them to the probability scale. The result is a matrix whose \((i,j)\) element is \(\mathbb{P}(\textrm{team i beats team j})\).

probs <- 1/(1 + exp(-1 * outer(X = lambda, Y = lambda, FUN = "-")))
diag(probs) <- NA
round(probs, digits = 3)

1: The probability of a team beating itself is not well-defined so we will manually set these values to NA

      [,1]  [,2]  [,3]  [,4]
[1,]    NA 0.622 0.668 0.731
[2,] 0.378    NA 0.550 0.622
[3,] 0.332 0.450    NA 0.574
[4,] 0.269 0.378 0.426    NA

Armed with the probabilities of each team beating every other team, we can start to compute the probabilities of more elaborate events. Assuming the match outcomes are independent, the probability that team 1 wins its matches against Teams 2 and 3 but loses to Team 4 is \[ 0.622 \times 0.668 \times (1 - 0.731) \approx 11.1\% \]

round(probs[1,2] * probs[1,3] * probs[1,4], digits = 3)

[1] 0.304

We can also compute the probability that Team 2 wins exactly 2 of its 3 games \[ \begin{align} \mathbb{P}(\textrm{team 2 wins 2 games}) &= \mathbb{P}(\textrm{team 2 beats 1 \& 3 and loses to 4}) \\ &~+ \mathbb{P}(\textrm{team 2 beats 1 \& 4 and loses to 3}) \\ &~+ \mathbb{P}(\textrm{team 2 betas 3 \& 4 and loses to 1}) \end{align} \]

probs[2,1] * probs[2,3] * probs[4,2] + 
  probs[2,1] * probs[2,4] * probs[3,2] + 
  probs[1,2] * probs[2,3] * probs[2,4]

[1] 0.3971986

Example: Tournament Simulation

In our 4-team round-robin tournament, what is the probability that Team 3 finishes in the top-2 in terms of wins⁴? To compute this probability by hand, we would need to enumerate all the possible ways in which Team 3 can finish in the top-2 positions after all 6 matches. While not impossible to do in our 4-team example, such manual computation becomes challenging as we increase the number of teams.

A much more elegant solution is to estimate this probability via simulation. Specifically, we can use the estimated match outcome probabilities to simulate each of the 6 games in our tournament many, many times. We can then compute the proportion of simulations in which Team 3 finishes in the top-2 in terms of wins.

To perform this simulation, we begin by enumerating each pair of teams using the combn() function.

combn(x = 1:4, m = 2)

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    2    2    3
[2,]    2    3    4    3    4    4

The function combn() returns an array with two rows (one for each team) and columns for every game in the tournament. It will be more useful for us to convert this into a data table with rows for each game. In this table, we will label the teams player1 and player2 and also append a column containing the probability that player1 beats player2

design <- 
  combn(x = 1:4, m = 2) |> 
  t() |>
  as.data.frame() |>
  dplyr::rename(player1 = V1, player2 = V2) |>
  dplyr::rowwise() |>
  dplyr::mutate(prob = probs[player1, player2]) |>
  dplyr::ungroup()
design

1: Since combn() returns a matrix whose columns index games, we need to take its transpose
2: For the dplyr verbs to work, we need to convert our table from a matrix to a data.frame
3: By default as.data.frame will use the columns names V1, V2, etc. So, we change them here to something more informative
4: Without the rowwise(), this line would create columns for all pairwise probabilities and not the desired one.

# A tibble: 6 × 3
  player1 player2  prob
    <int>   <int> <dbl>
1       1       2 0.622
2       1       3 0.668
3       1       4 0.731
4       2       3 0.550
5       2       4 0.622
6       3       4 0.574

Since match outcomes are binary, simulating the outcomes of the 6 matches is equivalent to flipping 6 independent coins, each with its own probability of landing heads. We interpret the event that a particular coin lands heads with the event that player1 wins the match and the event of the coin landing tails as the event that player2 wins the match. We can simulate these binary events using the rbinom() function

set.seed(479)
outcomes <- rbinom(n = 6, size = 1, prob = design$prob)
outcomes

[1] 0 1 1 0 1 1

The first three entries in outcomes correspond to the games in the first three rows of design, which are played between Team 1 and Teams 2, 3, and 4. So, in this simulation Team 1 loses to Team 2 but beats Teams 3 and 4. The fourth and fifth entry in outcomes corresponds to the games between Team 2 and Teams 3 & 4. These entries are 0 and 1, respectively, meaning that in this simulation Team 2 lost to Team 3 but beat Team 4. Finally, the sixth entry corresponds to the game between Team 3 and Team 4. It is equal to 1, meaning that in this simulation Team 3 defeated Team 4.

To tabulate each teams’ wins, we create a temporary data frame containing the participants and a column recording the winner (winner). Then, we can count the number of times each team appears in the column winner using a grouped summary. In this simulation, Teams 1, 2, and 3 each won exactly 2 games.

tmp_df <-
  design |>
  dplyr::select(player1, player2) |>
  dplyr::mutate(
    outcome = outcomes,
    winner = ifelse(outcome == 1, player1, player2),
    player1 = factor(player1, levels = 1:4),
    player2 = factor(player2, levels = 1:4),
    winner = factor(winner, levels = 1:4))

wins <-
  tmp_df |>
  dplyr::group_by(winner) |>
  dplyr::summarise(Wins = dplyr::n()) |>
  dplyr::rename(Team = winner) |>
  tidyr::complete(Team, fill = list(Wins = 0))

1: If a team wins no games, they will not appear in the column winner. Using complete(), we can add rows for these teams and manually assign them 0 wins.

We can now repeat this simulation many times, each time recording the number of wins for each team. In the code block below, build an array whose rows correspond to simulation replications and whose columns record the number of wins for each team.

n_sims <- 5000
simulated_wins <- matrix(data = NA, nrow = n_sims, ncol = 4)
for(r in 1:n_sims){
  set.seed(479+r)
  outcomes <- rbinom(n = 6, size = 1, prob = design$prob)
  wins <-
    design |>
    dplyr::select(player1, player2) |>
    dplyr::mutate(
      outcome = outcomes,
      winner = ifelse(outcome == 1, player1, player2),
      winner = factor(winner, levels = unique(c(design$player1, design$player2)))) |>
  dplyr::group_by(winner) |>
  dplyr::summarise(Wins = dplyr::n()) |>
  dplyr::rename(Team = winner) |>
  tidyr::complete(Team, fill = list(Wins = 0))
  
  simulated_wins[r,] <- wins |> dplyr::pull(Wins)
}

Out of the 5000 simulations, Team 1 won all 3 of its games in 1534 simulations. Based on this, our estimated probability of Team 1 winning all 3 of its matches is \(1534/5000 \approx 30.6\%,\) which is quite close to the exact value of 30.4% (which we can obtain by multiplying probs[1,2] * probs[1,3] * probs[1,4]).

table(simulated_wins[,1])


   0    1    2    3 
 163 1105 2198 1534

To estimate the probability that Team 3 finishes in the top-2, we will first rank each team based on the number of games they won in each simulation. Like in Lecture 5, we will call rank on the negative value of wins so that teams with more wins get lower numerical ranks. In the case of ties, we will assign the minimum possible rank (using the argument ties.method="min"). As an illustration, say that Teams 1 and 2 both won 2 games and Teams 3 and 4 both won 1 game each. This convention would assign Teams 1 and 2 a rank of 1 and Teams 3 and 4 a rank of 3 (there would be no second place).

rank(-1*c(2,2,1,1), ties.method = "min")

[1] 1 1 3 3

We compute the team rankings for each simulation replication using the apply() function.

simulated_ranks <-
  t(
    apply(-1*simulated_wins, MARGIN = 1,
        FUN = rank, ties.method = "min"))

1: In this case, apply() will return a matrix with 4 rows and 5000 columns, so we need to transpose it
2: We specify MARGIN = 1 because we want to rank values within each row (i.e., simulation replication)
3: apply() allows us to pass additional arguments needed for the function we passed via the argument FUN.

We find that in 1565 of the 5000 simulations, Team 3 had the most wins (rank of 1) and in 1092 of the 5000 simulations, Team 3 had the second-most wins (rank of 2). So, the estimated probability of Team 3 finished in the top-2 in terms of wins is about 0.531.

table(simulated_ranks[,3])


   1    2    3    4 
1565 1092 1547  796

Identifability

The basic Bradley-Terry model assigns each team \(j\) a latent strength \(\lambda_{j}\) and models the log-odds of team \(i\) beating team \(j\) as the difference in strengths \(\lambda_{i} - \lambda_{j}.\) Notice that these log-odds (and hence, the probability of team \(i\) beating team \(j\)) does not change if we add the same constant (e.g., 10) to each \(\lambda_{j}.\) This means that we can only estimates the underlying latent strengths up to an additive constant. In practice, we generally fix one of the \(\lambda_{j}\)’s to be zero; usually, the team names are represented are factor variables and we simply set the latent strength for the reference level to be zero.

Scraping D1 Women’s Hockey Results

We will fit a Bradley-Terry model to estimate the latent team strengths of all D1 Women’s Ice Hockey Teams. To do this, we need to build a data table that contains, at a minimum, the identities of the home and away teams and an indicator of whether the home team won for each regular-season game. Unfortunately, there is not (yet) an existing R package that contains such a table or even provides the tools needed to scrape the data. So, we will need to acquire the data ourselves.

US College Hockey Online is an independent media organization focused on college hockey. In addition to posting articles, weekly polls & rankings, and team & individual statistics, USCHO publishes box score and play-by-play data. For instance, here is the play-by-play from the 2025 championship game. They also publish match results for each season in a tabular form⁵

This information is readily scraped using tools like rvest.

Download pre-scraped data

Unfortunately, USCHO changed their website back-end and the following code no longer works (as of October 5, 2025). The data for this lecture is available as a CSV file here. You should also download a list of team abbreviations from here.

After downloading the data, you can load the match results into R using the code and then proceed directly to Section 3.1

wd1hockey_2024_2025 <- readr::read_csv(file = "wd1hockey_2024_2025.csv")

The codeblock below defines a function that scrapes a single season’s worth of D1 Women’s Ice Hockey results.

Function for scraping

scrape_uscho <- function(season){
  
  col_names <-
    c("Day", "", "Date", "Time", 
      "Opponent", "Opp Score", "", "Home", "Score", "OT",
      "Notes", "Type", "Summary", "TV")
  
  target_url <- 
    paste0("https://www.uscho.com/scoreboard/division-i-women/",
           season, "-", season+1, 
           "/composite-schedule/")
  raw_data <-
    rvest::read_html(target_url)
  raw_elements <-
    raw_data |>
    rvest::html_elements(xpath = '//*[(@id = "ez-outstream-ez-stuck-close")] | //td') |> 
    rvest::html_text2()
  
  assign(paste0("raw_wd1hockey_", season, "_", season+1), raw_data)
  save(list = paste0("raw_wd1hockey_", season, "_", season+1),
       file = paste0("raw_wd1hockey_", season, "_", season+1, ".RData"))
  
  if(!identical(col_names, raw_elements[1:14])){
    message(paste0("Season ", season,"-",season+1, ": did not find expected column names in html elements!"))
    results <- NULL
  } 
  
  # 2022-23 does not have TV abbreviations
  
  altitude_ix <- which(grepl("Altitude", raw_elements))[1]
  
  if(altitude_ix %% 14 != 1){
    message(paste0("Season ", season, "-", season+1, ": did not find expected number of elements in table!"))
    results <- NULL
  } else{
    # First 14 elements 
    n_games <- (altitude_ix - 1)/14 - 1 
    games <- 
      data.frame(
        Day = rep(NA, times = n_games),
        Date = rep(NA, times = n_games),
        Time = rep(NA, times = n_games),
        Opponent = rep(NA, times = n_games),
        OppScore = rep(NA, times = n_games),
        Home = rep(NA, times = n_games),
        HomeScore = rep(NA, times = n_games),
        OT = rep(NA, times = n_games),
        Notes = rep(NA, times = n_games),
        Type = rep(NA, times = n_games))
    
    games[["Day"]] <- 
      raw_elements[seq(15, (n_games+1)*14, by = 14)] 
    games[["Date"]] <-
      raw_elements[seq(17, (n_games+1)*14, by = 14)]
    games[["Time"]] <-
      raw_elements[seq(18, (n_games+1)*14, by = 14)]
    games[["Opponent"]] <-
      raw_elements[seq(19, (n_games+1)*14, by = 14)]
    games[["OppScore"]] <-
      raw_elements[seq(20, (n_games+1)*14, by = 14)]
    games[["Home"]] <-
      raw_elements[seq(22, (n_games+1)*14, by = 14)]
    games[["HomeScore"]] <-
      raw_elements[seq(23, (n_games+1)*14, by = 14)]
    games[["OT"]] <-
      raw_elements[seq(24, (n_games+1)*14, by = 14)]
    games[["Notes"]] <-
      raw_elements[seq(25, (n_games+1)*14, by = 14)]
    games[["Type"]] <-
      raw_elements[seq(26, (n_games+1)*14, by = 14)]
    results <- 
      tidyr::as_tibble(games) |>
      dplyr::mutate(
        OT = ifelse(OT == "", NA_character_, OT),
        Notes = ifelse(Notes == "", NA_character_, Notes),
        HomeScore = as.integer(HomeScore),
        OppScore = as.integer(OppScore))
  }
  return(results)
}

We can use this function to obtain game-by-game results from the 2024-25 regular season

wd1hockey_2024_2025 <- scrape_uscho(season = 2024)
save(wd1hockey_2024_2025, file = "wd1hockey_2024_2025.RData")

1: Save a copy of the data table so that we can load it later without have to re-scrape it

In addition to the recording the date and time of each game, our data table wd1hockey_2024_2025 records the home and away team names (Home and Opponent); the home and away team scores (HomeScore and OppScore); whether the game went into overtime (OT = "OT") or not (OT = NA).

# A tibble: 10 × 10
   Day   Date       Time     Opponent OppScore Home  HomeScore OT    Notes Type 
   <chr> <chr>      <chr>    <chr>       <int> <chr>     <int> <chr> <chr> <chr>
 1 Sat.  1/4/2025   1:00 CT  Minneso…        4 Lind…         1 <NA>  <NA>  NC   
 2 Fri.  12/6/2024  6:00 ET  Connect…        2 New …         1 <NA>  <NA>  HE   
 3 Fri.  12/6/2024  6:00 ET  Dartmou…        3 Clar…         5 <NA>  <NA>  EC   
 4 Fri.  11/29/2024 6:00 ET  Colgate         3 Syra…         1 <NA>  <NA>  NC   
 5 Sat.  1/18/2025  3:00 ET  Renssel…        0 Dart…         0 OT    DAR … EC   
 6 Sat.  3/1/2025   4:30 ET  Maine           3 Bost…         4 <NA>  HEA … NC   
 7 Fri.  1/3/2025   7:00 ET  Yale            8 Robe…         1 <NA>  <NA>  NC   
 8 Fri.  11/1/2024  3:00 CT  Minneso…        2 Bemi…         1 <NA>  <NA>  WC   
 9 Fri.  1/10/2025  6:00 ET  St. Law…        4 Union         2 <NA>  <NA>  EC   
10 Sat.  11/16/2024 12:00 ET Vermont         0 Prov…         0 OT    VER … HE

In the following subsections, we will investigate what the columns Type and Notes record.

Determining the Winner

In order to fit a Bradley-Terry model using these data, we still need to determine whether the home team won the game or not. Intuitively, we can do this by adding a column to wd1hockey_2024_2025 that compares HomeScore to OppScore. Unfortunately, this simple strategy is not quite adequate as there are a number of games in which HomeScore = OppScore. It turns out that many of these games ended in a shootout[^shootout] and the ultimate winner is recorded in the Notes column.

wd1hockey_2024_2025 |>
  dplyr::filter(HomeScore == OppScore) |>
  dplyr::select(Opponent, OppScore, Home, HomeScore, Notes) |>
  dplyr::slice_head(n=5)

# A tibble: 5 × 5
  Opponent     OppScore Home          HomeScore Notes          
  <chr>           <int> <chr>             <int> <chr>          
1 Maine               2 New Hampshire         2 UNH wins SO 2-1
2 Assumption          1 New Hampshire         1 AC wins SO 2-1 
3 Minnesota           1 Ohio State            1 OSU wins SO 2-1
4 Rensselaer          3 RIT                   3 RIT wins SO 1-0
5 Northeastern        4 Holy Cross            4 NOE wins SO 2-1

Notes uses a three-character abbreviation instead of the team’s full name. We can look up the abbreviations by inspecting the box score for each game. For instance, by looking under the “Team” column here, we see that Wisconsin and Ohio States’ abbreviations are, respectively, "WIS" and "OSU". A table listing every team and its USCHO’s abbreviation is available here. After downloading this table and saving it in our working directory, we can load it into our R session

team_abbr <- readr::read_csv(file = "wd1hockey_teams.csv")

To see whether the home team won a match, we first check whether HomeScore and OppScore are identical. If not, then we can check whether HomeScore > OppScore. But if they are the same, we can determine the winner by parsing the text in Notes. The following code block implement a function that executes this procedure.

Function that determines if the home team won the game

get_home_winner <- 
  function(Home_abbr, HomeScore, Opp_abbr, OppScore, Notes)
{
  home_winner <- NA
  if(HomeScore > OppScore) home_winner <- 1
  else if(HomeScore < OppScore) home_winner <- 0
  else{
    if(!is.na(Notes) & grepl("SO", Notes)){
      winner <- 
        stringr::str_split(string = Notes, pattern = " wins")[[1]][1]
      if(winner == Home_abbr) home_winner <- 1
      else if(winner == Opp_abbr) home_winner <- 0
    }
  }
  return(home_winner)
}

1: Here is where we check if Notes contains the string “SO”
2: Pulls out the three-character abbreviation for the winning team

We can now apply this function to every row in our data table and append a columns Home_Winner and Opp_Winner containing an indicator of whether the home team won the game.

wd1hockey_2024_2025 <-
  wd1hockey_2024_2025 |>
  dplyr::mutate(
    HomeScore = as.integer(HomeScore),
    OppScore = as.integer(OppScore)) |>
  dplyr::left_join(y = team_abbr |> dplyr::rename(Home_abbr = abbr, Home = team),
                   by = "Home") |>
  dplyr::left_join(y = team_abbr |> dplyr::rename(Opp_abbr = abbr, Opponent = team),
                   by = "Opponent") |>
  dplyr::rowwise() |>
  dplyr::mutate(
    Home_Winner = 
      get_home_winner(Home_abbr, HomeScore, Opp_abbr, OppScore, Notes)) |>
  dplyr::ungroup() |>
  dplyr::mutate(Opp_Winner = 1-Home_Winner)

1: Add columns containing the home and away team abbreviations
2: rowwise() is like group_by() in that divides the data table by row and applies the same operation to each row.

Identifying Regular Season Games

The column Type records whether the game was an exhibition game (Type="EX"), non-conference game (Type="NC"), or a conference game. In the case of a conference game, Type records an abbreviation of the conference.

table(wd1hockey_2024_2025$Type)


 AH  EC  EX  HE  NC NEW  WC 
 60 132  14 135 246 112 112

We will fit our Bradley-Terry model using data only from the regular season, which includes conference tournaments and mid-season tournaments⁶ but not the NCAA Tournament, which is used to determine the national championship. The column Notes, in addition to recording the result of shootouts, also records whether the game was part of a tournament.

reg_season <-
  wd1hockey_2024_2025 |>
  dplyr::filter(Type != "EX" & !grepl("NCAA W Tournament", Notes))

Of the 787 games in reg_season, there are 18 that ended in ties. We will remove these from our analysis⁷

no_ties <-
  reg_season |> dplyr::filter(!is.na(Home_Winner))

Fitting Bradley-Terry Models

We will use the package BradleyTerry2 to fit Bradley-Terry models in R. You can install the package using

devtools::install_github("hturner/BradleyTerry2")

Re-formatting Our Data

The package BradleyTerry2 uses a somewhat idiosyncratic syntax. Instead of passing a table with one row per game, it instead wants each row to correspond to a home team-away team pair. It further encodes the match results using a pair of numbers, the number of home team wins and the number of away team wins. In the following code block, we first divide the data into subgroups corresponding to unique pairs of home and away teams. Then, within each of those subgroups, we count up the number of home team and away team wins.

unik_teams <- sort(unique(c(no_ties$Home, no_ties$Opponent)))


results <-
  no_ties |>
  dplyr::rename(home.team = Home, away.team = Opponent) |>
  dplyr::group_by(home.team, away.team) |>
  dplyr::summarise(
    home.win = sum(Home_Winner),
    away.win = sum(Opp_Winner), .groups = 'drop') |>
  dplyr::mutate(
    home.team = factor(home.team, levels = unik_teams),
    away.team = factor(away.team,levels = unik_teams))

1: Subdivide data table of all games by home team - away team pairs
2: Count the number of home and away team wins in each subgroup
3: Convert home and away team identities into factor variables.

To better understand the structure of results, we can take a look at a few rows corresponding to Wisconsin’s home games.

set.seed(123)
results |> dplyr::filter(home.team == "Wisconsin") |> dplyr::slice_sample(n=5)

# A tibble: 5 × 4
  home.team away.team       home.win away.win
  <fct>     <fct>              <dbl>    <dbl>
1 Wisconsin St. Cloud State        2        0
2 Wisconsin St. Thomas             2        0
3 Wisconsin Minnesota              3        0
4 Wisconsin Ohio State             1        1
5 Wisconsin Lindenwood             2        0

The data table records that Wisconsin played Ohio State at home twice, winning one game and losing one game. Similarly, Wisconsin played Minnesota three times, winning all three games.

Estimating & Extracting Team Strengths

We’re now ready to estimate the latent strength parameters using the function BTm from the BradleyTerry2 package.

fit <-
  BradleyTerry2::BTm(
    outcome = cbind(home.win, away.win),
    player1 = home.team, player2 = away.team,
    refcat = "Assumption",
    data = results)

1: The outcome argument needs to be a two-column matrix giving the number of home & away team wins for pair of home & away teams.
2: We use player1 and player2 to indicate the home and way teams
3: We manually specify the reference team for whom we set \(\lambda = 0.\)
4: BTm looks for the variables referenced in the outcome, player1, and player2 arguments within the data table passed in the data argument.

The function BTabilities() returns a matrix with columns containing the estimated value of \(\lambda\) for each team and the associated standard error. Taking a look at the first few rows, we can verify that BTm set Assumption College’s latent strength to be 0. We further see that the model estimates the next several teams as having very large \(\lambda\)’s, suggesting that they are very likely to beat Assumption.

lambda_hat <- BradleyTerry2::BTabilities(fit)
lambda_hat[1:5,]

                   ability     s.e.
Assumption        0.000000 0.000000
Bemidji State     3.861526 1.282782
Boston College    5.537068 1.230659
Boston University 5.786624 1.226730
Brown             4.798288 1.217776

Wisconsin, Ohio State, Cornell, and Minnesota all have even larger positive estimated strengths. This is not especially surprising, as these are four of the better teams in the country[^frozenfour].

lambda_hat[c("Wisconsin", "Ohio State", "Cornell", "Minnesota"),]

            ability     s.e.
Wisconsin  9.341843 1.430356
Ohio State 7.847175 1.294598
Cornell    7.242125 1.258623
Minnesota  7.346209 1.268487

Figure 1 visualizes the estimated latent strength of each team relative to Assumption. We see the vast majority of the estimates are positive and many of the 95% confidence intervals do not cross 0, indicating that most teams are favored against Assumption.

par(mar = c(3,3,2,1), mgp = c(1.8, 0.5, 0))
oi_colors <- palette.colors(palette = "Okabe-Ito")

n_teams <- nrow(lambda_hat)
y_limit <- range(c(lambda_hat[,1] - 2*lambda_hat[,2], lambda_hat[,1] + 2*lambda_hat[,2]))

ix <- order(lambda_hat[,1])
plot(1, type = "n", 
     main = "Latent strength (relative to Assumption)",
     xlim = c(0, n_teams), ylim = y_limit,
     xaxt = "n", xlab = "",
     ylab = "Estimated lambda")
abline(h = 0, col = oi_colors[2], lwd = 0.5, lty = 2)
for(i in 1:n_teams){
  lines(x = c(i,i), 
        y = lambda_hat[ix[i],"ability"] + c(-2,2) * lambda_hat[ix[i], "s.e."],
        col = oi_colors[9], lwd = 0.5)
  points(x = i, y = lambda_hat[ix[i],"ability"], pch = 16, cex = 0.5, col = oi_colors[1])
    team_name <- rownames(lambda_hat)[ix[i]]
  abbr <- 
    team_abbr |>
    dplyr::filter(team == team_name) |> dplyr::pull(abbr)
  if(i %% 2 == 0){
      text(x = i, 
       y = 0.25 + lambda_hat[ix[i], "ability"]  + 2 * lambda_hat[ix[i], "s.e."],
       labels = abbr, cex = 0.7)
  } else{
      text(x = i, 
       y = -0.25 + lambda_hat[ix[i], "ability"]  - 2 * lambda_hat[ix[i], "s.e."],
       labels = abbr, cex = 0.7)
  }
}

1: Get the number of teams and range of \(\hat{\lambda}\)’s
2: Sort the \(\hat{\lambda}\)’s in increasing order and get the indices
3: Confidence interval for \(\lambda_{j}\) formed by taking the point estimate \(\pm 2\) standard errors

Figure 1: Most teams are favored against the reference team (Assumption College)

Instead of using Assumption College as the reference, we can re-center our estimated \(\hat{\lambda}_{j}\)’s using the function update()

fit_nh <- update(fit, refcat = "New Hampshire")
lambda_hat_nh <- BradleyTerry2::BTabilities(fit_nh)
lambda_hat_nh[c("Assumption", "New Hampshire", "Wisconsin", "Ohio State"),]

                ability      s.e.
Assumption    -4.662547 1.2191462
New Hampshire  0.000000 0.0000000
Wisconsin      4.679297 0.9932217
Ohio State     3.184629 0.7887300

Visualizing these shifted \(\hat{\lambda}_{j}\)’ we find that some teams are significantly stronger than New Hampshire and others are substantially weaker.

Figure 2: Some teams are significantly stronger and some teams are significantly weaker than New Hampshire

Simulating a Best-of-5 Series

According to our model estimates, the probability that Wisconsin beats Ohio State in a single game is about 81%

1/(1 + exp(-1 * (lambda_hat["Wisconsin", "ability"] - lambda_hat["Ohio State", "ability"])))

[1] 0.8167779

We can use a simulation to estimate the probability that Wisconsin would defeat Ohio State in a best-of-5 series. We first develop code to simulate the series, recording the eventual winner and the number of games needed to be played. For this simulation, we use a while() loop in which each iteration involves simulating a single game. While running this loop, we keep a running tally of each teams’ wins (wi_wins and osu_wins) as well as the number of games played game_counter. We want to iterate through the loop until either (i) wi_wins == 3 & osu_wins < 3 (Wisconsin wins the series) or (ii) osu_wins == 3 & wi_wins < 3 (Ohio State wins the series).

wi_wins <- 0
osu_wins <- 0

wi_prob <- 
  1/(1 + exp(-1 * (lambda_hat["Wisconsin", "ability"] - lambda_hat["Ohio State", "ability"])))

game_counter <- 0
outcomes <- rep(NA, times= 5)
set.seed(481)
while( wi_wins  < 3 & osu_wins < 3 & game_counter < 5){
  game_counter <- game_counter + 1
  outcomes[game_counter] <- rbinom(n = 1, size = 1, prob = wi_prob)
  
  if(outcomes[game_counter] == 1) wi_wins <- wi_wins + 1
  else if(outcomes[game_counter] == 0) osu_wins <- osu_wins + 1

}
if(wi_wins == 3){ 
  winner <- "Wisconsin"
} else if(osu_wins == 3){
  winner <- "Ohio State"
} else{
  winner <- NA
}
 
cat("Series ended after", game_counter, " games. Winner = ", winner, "\n")
cat("Wisconsin: ", wi_wins, " Ohio State: ", osu_wins, "\n")
cat("Game results:", outcomes, "\n")

1: Initialize the count of each teams’ wins to be 0
2: P(Wisconsin beats Ohio State)
3: Initialize the number of games played to 0. This counter is incremented in every loop iteration
4: We simulate a game only if the series is not yet decided
5: Simulate outcome of the match: outcome = 1 means Wisconsin won and outcome = 0 means Ohio State won
6: Increment running tallies of each teams’ wins based on outcome

Series ended after 4  games. Winner =  Wisconsin 
Wisconsin:  3  Ohio State:  1 
Game results: 1 1 0 1 NA

In this single simulation, Wisconsin loses the first game, wins the second and third game, loses the fourh game, and wins the fifth game. To estimate the probability that Wisconsin wins a 5 game series, we can repeat this simulation many, many times. In the following code block, we create a data frame series_results whose rows corresponding to individual simulation replications and which contains two columns, one recording the winner and the other recording the number of games played to determine the winner.

n_sims <- 5000
series_results <- data.frame(Winner = rep(NA, times = n_sims), Games = rep(NA, times = n_sims))


wi_prob <- 
  1/(1 + exp(-1 * (lambda_hat["Wisconsin", "ability"] - lambda_hat["Ohio State", "ability"])))

for(r in 1:n_sims){
  wi_wins <- 0
  osu_wins <- 0
  game_counter <- 0
  winner <- NA
  
  set.seed(479+r)
  while(wi_wins < 3 & osu_wins < 3 & game_counter < 5){
    game_counter <- game_counter + 1
    outcome <- rbinom(n = 1, size = 1, prob = wi_prob)
    
    if(outcome == 1) wi_wins <- wi_wins+1
    else osu_wins <- osu_wins <- osu_wins + 1
  }
  
  series_results[r, "Games"] <- game_counter
  if(wi_wins == 3) series_results[r, "Winner"] <- "Wisconsin"
  else if(osu_wins == 3) series_results[r, "Winner"] <- "Ohio State"
}

1: Since \(\mathbb{P}(\textrm{Wisconsin beats Ohio State})\) does not change across simulation iterations, we define this outside the main loop
2: At the beginning of each simulation, we need to re-initialize the running tallies of each team’s wins and the number of games played
3: Since we’re only interested in the ultimate winner and number of games played, there is no need to save the individual match outcomes. But if you want to estimate the probability that Wisconsin wins the series but loses the first game, you’ll need to save individual match outcomes.

Across the 5000 simulated series, Wisconsin won about 95% (4768/5000). We further estimate that the series ends in 3 games with probability 55% (2767/5000), 4 games with probability 32% (1585/5000), and 5 games with probability 13% (648/5000).

table(series_results$Winner)


Ohio State  Wisconsin 
       232       4768

table(series_results$Games)


   3    4    5 
2767 1585  648

Looking Ahead

Some of the games in our dataset were played at team’s home venue (e.g., La Bahn arean here in Madison) while other games were played a neutral venue (e.g., the championship round of the Beanpot, which is typically played at T.D. Garden in Boston). Our initial Bradley-Terry model does not account for any potential home-ice advantage. In Lecture 13, we will add an term to our model that is 1 if the a game was played at one of the teams’ home and is 0 if it was played a netural site.

We will then simulate a more complex, single-elimination tournament and use the simulation to estimated probabilities of events like “Wisconsin reaches the semi-finals”. To facilitate this, we will save the data table no_ties for future use

save(no_ties, file = "wd1hockey_regseason_2024_2025.RData")

Footnotes

The full game is available to view here.↩︎
In a “best of” series, teams play each other until one has won a majority of the contests. So, in a “best-of-5” series, the first team to reach 3 wins is declared the series winner. For more, check out the Wikipedia entry.↩︎
That is, manually computing 1/(1 + exp(-1 * (lambda[i] - lambda[j]))) for every pair of (i,j).↩︎
Many competitions features a round-robin “group stage” with the top-2 finishers for each group advancing to a single-elimination tournament to decide the champion.↩︎
See here for the 2024-25 results↩︎
The Beanpot is a good example: it is a tournament played by teams from schools in the Greater Boston area.↩︎
There are extensions of the Bradley-Terry model that include ties. But the software we’ll be using in this class does not yet support fitting such extensions. So, we will exclude these games from our analysis↩︎