Getting the data

In this example analysis, we’re going to examine Super Bowl LII to gain insight into how the Philadelphia Eagles beat the New England Patriots (last year’s Super Bowl was pretty boring…)

The first step is to load the data. Although we can use the nflscrapR package to do this, we’re going to save time and access the repository of files that are already available to load and analyze from here. We’re going to read in the entire set of play-by-play data from the post-season for the 2017-2018 NFL season.

> post_pbp_2017 <- read_csv("https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/post_season/post_pbp_2017.csv")
Parsed with column specification:
cols(
  .default = col_double(),
  home_team = col_character(),
  away_team = col_character(),
  posteam = col_character(),
  posteam_type = col_character(),
  defteam = col_character(),
  side_of_field = col_character(),
  game_date = col_date(format = ""),
  game_half = col_character(),
  time = col_time(format = ""),
  yrdln = col_character(),
  desc = col_character(),
  play_type = col_character(),
  pass_length = col_character(),
  pass_location = col_character(),
  run_location = col_character(),
  run_gap = col_character(),
  field_goal_result = col_character(),
  extra_point_result = col_character(),
  two_point_conv_result = col_character(),
  timeout_team = col_character()
  # ... with 88 more columns
)
See spec(...) for full column specifications.
Warning: 6 parsing failures.
 row                                col           expected     actual                                                                                                             file
1226 lateral_rusher_player_id           1/0/T/F/TRUE/FALSE 00-0030496 'https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/post_season/post_pbp_2017.csv'
1226 lateral_rusher_player_name         1/0/T/F/TRUE/FALSE L.Bell     'https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/post_season/post_pbp_2017.csv'
1862 lateral_interception_player_id     1/0/T/F/TRUE/FALSE 00-0027762 'https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/post_season/post_pbp_2017.csv'
1862 lateral_interception_player_name   1/0/T/F/TRUE/FALSE R.Jones    'https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/post_season/post_pbp_2017.csv'
2109 lateral_kickoff_returner_player_id 1/0/T/F/TRUE/FALSE 00-0030288 'https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/post_season/post_pbp_2017.csv'
.... .................................. .................. .......... ................................................................................................................
See problems(...) for more details.

This has loaded the entire play-by-play from the 2017 post-season (including the Pro Bowl!). We’re going to filter this data down to the Super Bowl only, which has the game_id of 2018020400 And to make this simpler, we’re going to select only a subset of the columns to work with

Team comparison

Now using this dataset, we can proceed to compare the performance and decision making of the Eagles and the Patriots. First what plays did they call?

We can directly plot the proportions for each team as side-by-side barcharts

To make this better, we can use the actual team colors based on a dataset from the nflscrapR package. For the Patriots we’ll use their secondary color #c60c30 and we’ll use the Eagles primary color of #004953.

Let’s take a look at the performance of these plays by yards-gained:

One number summaries toss out alot of information! Let’s view the entire distribution instead. One of the best ways to do this is with a beeswarm plot - which displays the actual individual points rather than smoothed summaries. We’ll display these points on top of violin plots which provide us with the general shape of the distributions

Win probability impact

All yards are not created equal! We should really be looking at the impact in terms of win probability added (WPA) instead to get a better understanding of what impacted the game.

Now we see a big difference between the Eagles and Patriots, especially on those fourth down passing attempts… which plays are those?

The number one play was the go-ahead TD pass which gave the Eagles the lead in the 4th quarter, while the second highest play was the famous Philly Special!

Win probability chart

Finally, we can wrap up this analysis with a win probability chart that shows the overall story of the game.

Congratulations, you now have what it takes to cover NFL games for the Athletic!