Week 23: Football!

February 23, 2018

Last week, I was able to write a script that learns a linear model and performs an isotonic regression to redistribute the outcomes. We ran this model using the College Scorecard data paired with US News college rankings. Unfortunately, the number of colleges available in our data was so small that we were unable to split our data into training, calibration, and testing and still have enough to fill 10 bins. So this week, my goal was to find a new dataset with lots of data.

I decided to look at football rankings. During football season the NCAA posts weekly rankings of who they think the best teams are going to be (see). From the first ranking (pre-season) to the last (post-season), these rankings change drastically. My idea was that fairness in this context could be something to do with the college, such as public or private school or number of years the football program has been around (binarized), and that NCAA rankers could be biased against one type or the other. The problem is a bit contrived, but the goal was really to get our hands on bigger data.

So I scraped the ESPN website for pre-season and post-season rankings in 2016 and 2017 and joined it with statistics about the age of the football program and whether the college was public or private. Skipping the linear model formation step, I used the 2016 pre- and post-rankings to learn an isotonic regression split by public/private. I then transformed the 2017 pre-ranking with this regression and compared it to the post-ranking.

There wasn't much to see. I think it is because the dataset that I chose was actually kind of horrible for this. The NCAA page only ranks the top 25 colleges, and there are 124 colleges in the FCS. Not only this, the universities in the top 25 changed from week to week, so the only colleges for which we had all four data points were, like, Alabama.

This was about the extent of what I covered this week. I want to try this again, perhaps with a better dataset. Here's a link to one that was used by PODIUM, a project at Georgia Tech that's a lot like ours.

Search This Blog

MaryAnn's Blog

Week 23: Football!

Comments

Post a Comment

Popular posts from this blog

Week 31: Datasets for fairness

Week 32-33: Wrapping up

Week 30: Returning to Fairness