Moving from machine learning to Statistics: The case of expected points in American football
Abstract : Expected points is a value function fundamental to player evaluation and strategic in-game decision-making across sports analytics, particularly in American football. To estimate expected points, football analysts use machine learning tools, which are not equipped to handle certain challenges. They suffer from selection bias, display counter-intuitive artifacts of overfitting, do not quantify uncertainty in point estimates, and do not account for the strong dependence structure of observational football data. These issues are not unique to American football or even sports analytics; they are general problems analysts encounter across various statistical applications, particularly when using machine learning in lieu of traditional statistical models. We explore these issues in detail and devise expected points models that account for them. We also introduce a widely applicable novel methodological approach to mitigate overfitting, using a catalytic prior to smooth our machine learning models. —
A pre-print is available here.
Recommended citation: Brill, R.S., Yee, R., Deshpande, S.K., and Wyner, A.J. (2024+). "Moving from machine learning to Statistics: The case of expected points in American football." arXiv: 2409.04889.