Posts

Showing posts from February, 2018

Week 24: A new direction

Hello all! So this week was very different from the ones previously. The reason is because the IQP team has finished up their application (yay!!!). It looks really good. Now that there is a finished project, our advisor wants us to write a visualization paper to submit to a conference. The conference we are shooting for is VAST, and their paper deadline is March 31. With that looming date, we are dropping everything else and focusing on writing this paper. This week, I met with Caitlin a couple times to discuss tasks for the vis paper. We are planning to write the actual paper together (segmented), but Caitlin is going to start on the outline. Meanwhile, I found datasets that match those in the PODIUM and LineUp papers so that we could compare our visualizations. The next step is going to be for me to design an explore view, similar to that we proposed at the beginning of this project, that will show the rankings and the relative weights of the attributes that contribute to the r...

Week 23: Football!

Last week, I was able to write a script that learns a linear model and performs an isotonic regression to redistribute the outcomes. We ran this model using the College Scorecard data paired with US News college rankings. Unfortunately, the number of colleges available in our data was so small that we were unable to split our data into training, calibration, and testing and still have enough to fill 10 bins. So this week, my goal was to find a new dataset with lots of data. I decided to look at football rankings. During football season the NCAA posts weekly rankings of who they think the best teams are going to be ( see ). From the first ranking (pre-season) to the last (post-season), these rankings change drastically. My idea was that fairness in this context could be something to do with the college, such as public or private school or number of years the football program has been around (binarized), and that NCAA rankers could be biased against one type or the other. The problem i...

Week 22: How to fit a model

Image
Last week, I tackled Isotonic Regression in Python. I used an amalgamated dataset including US News Top Colleges and College Scorecard that Caitlin cleaned for me :). Briefly, I'll discuss the process I ended up using in the script and post a pretty error picture. The reason this post is brief is because I spent a lot of time getting acquainted with Python and not much time doing anything effective. I now have a working script, but I spent most of the week staring at code and asking Caitlin to explain everything to me. I even went as far to harass my Probability professor to explain isotonic regression and ranking to me. Fortunately, he happened to do his thesis in a related field, and he is sending me some articles to help me out. The first thing I do after loading the cleaned data is split it into training, calibration, and testing subsets. I trained a linear regression model on the training data, and then used that trained model to get predicted outcomes with the test data...

Week 21: How to fix a model?

Image
Last week, I posted about my thought process working through evaluating and correcting for fairness in statistical parity. The next time I met with Caitlin, I tried explaining my thought process and how I evaluated fairness. It did not go well. I think the difficulty in this project is trying to come up with a reasonable way to evaluate fairness and correct models without being completely arbitrary. So after trying to talk out my example (from last week's post) with Caitlin, we ended up being even more confused by the time we met with Elke. We want to use some existing methodology so that it is easier to justify our process, but the fundamental characteristic of research is that no one else has solved this particular problem before. Fortunately, there has been work in the realm of correcting a calibration model, so we decided to go from there. In this paper entitled Predicting Good Probabilities with Supervised Learning , the authors talk about two methods for calibrating a ske...