Overview

Welcome to the webpage for the 2019 Wharton Moneyball Academy / Training Camp course on data analysis in R. In this course, you will learn the tools necessary to apply the concepts you learn in Prof. Wyner’s class while analyzing real sports datasets using the R programming language. Please bookmark this site and check back regularly before the program starts for updates. Below, you will find important information about setting up your system and installing the necessary software, as well as a brief schedule for the course.

Before You Arrive

This summer, you’ll be learning how to analyze data using R. R is a free, open-source software environment for statistical computing with several built-in functions for organizing, analyzing, and visualizing data. What separates R from programs like Excel, JMP, STATA, and Minitab is the ability for programmers, scientists, and statisticians to extend R’s basic functionality, and implement the latest algorithms and methods for analyzing massive and complex data. This extensibility has made R the de facto software standard in the academic statistics community and is driving the rapid adoption of R in the data analysis endeavors of several major corporations and government agencies like Bank of America, Facebook, the F.D.A., the New York Times, and Twitter.

R uses a command line interface, which means that you interact with the software by typing in some commands and hitting Enter/Return to execute those commands. This is in marked contrast to most other software that you’re probably accustomed to and makes learning R a little bit more challenging. To make our lifes a bit easier, we will use an integraded development environment (IDE) for R, known as RStudio.

So that we can start analyzing data right away, we’d like you to install R and RStudio before the first class This can be done prior to arriving at the program or on the first night (Sunday July 7 for Academy, Sunday July 21 for Training Camp) with your project group and RTA. Instructions for installing R and RStudio, as well as setting up your computer for the class are available here Note: tablets and chromebooks may not have sufficient computing power to run R and RStudio. We highly recommend bringing a laptop

We know that you are very excited about the program and we’re similarly excited to start working together. We’ve come up with a really short assignment for you to work on before you arrive. Problem Set 0 contains instructions for downloading and installing R and RStudio, which you should do prior to arrival. It also contains a very brief introduction to the R programming language with some simple exercises, and several questions that will motivate the concepts you’ll be exploring in Prof. Wyner’s class. Don’t worry if you don’t finish working your way through these exercises before you arrive. On the first night (Sunday July 7 for Academy or 21 for Training Camp), you’ll meet with your project team and RTA to discuss them. As we approach the start of camp, please check back for periodic updates to the site.

Daily Schedule

Each day, the instructors will be spend the first hour or so of each class introducing new R functionality and programming concepts. The notes for each lecture will be available on this website (see under the Lecture Notes tab in the menu at the top). These notes will contains worked out code examples and explanations. After the first hour of lecturing, you will have a chance to work on problem sets with your project team and RTA that will review and reinforce the material presented in that day’s lecture.

For Training Camp students: all lecture and problem sets will be available on this website under the Training Camp tab in the menu at the top.

About the Instructors

Cecilia Balocchi is a 5th year PhD student in Statistics at Wharton, interested in Bayesian statistics. Before moving to Philly, she grew up and studied in Italy, where she fell in love with gymnastics. Now she roots for all Philly sports teams.

Sameer Deshpande completed his PhD in Statistics at Wharton in 2018 and is currently a post-doctoral associate at MIT. He is broadly interested in Bayesian statistics. He loves all Dallas sports teams, but is especially passionate about the Cowboys.

Matteo Sordello is a 4th year PhD student in Statistics at Wharton, currently working on optimization problems. Originally from Italy, he supports Juventus and all of the Italian beach volleyball teams.

Ron Yurko is entering his third year as a PhD student in Statistics & Data Science at Carnegie Mellon University. Previously, he interned with the Pittsburgh Pirates and worked in finance as a quantitative analyst. As a Pittsburgh native, he of course roots for the Penguins and Steelers but is primarily burdened with being a lifelong Pirates fan.

Kat Wilson is a first year PhD student studying Quantitative Methods at UPenn. After teaching computer science courses with an educational technology company in Austin, TX, Kat pursued a Masters degree in Quant Methods at UT Austin, and is excited to continue this research at Penn. In her spare time, Kat loves reading, playing guitar, running marathons, and also rooting for the the New York Jets and Notre Dame football teams every fall.