Week 31: Datasets for fairness
This week, Diana and I looked for some datasets on which to evaluate our fairness criteria. This is no simple task. For one, the datasets need to be large enough to be split into training, calibration, and testing sets. Second, there needs to be some protected group attribute like sex, race, or age. Third, there needs to be some outcome ranking as well as some truth value. The truth is often the hardest attribute to find.
In this post, I introduce some of the datsets we decided to use. I try to identify each of the necessary features that we need to properly test our fairness criteria.
In this post, I introduce some of the datsets we decided to use. I try to identify each of the necessary features that we need to properly test our fairness criteria.
COMPAS
The COMPAS dataset is an obvious choice because it is well ingrained in the literature about fairness. It has 18,000 observations (n=18000) and 1000 attributes (k=1000). Protected groups can be sex (81/19) or race (51/49). The outcome attribute can be the recidivism score, the score used in court that represents the risk that the accused will commit another crime. The truth attribute is a little difficult; the number of months until they commit another crime? The severity of their future crime, should they commit one?
German Credit
This is a dataset used in the FA*IR paper from CIKM last year. n=1000 and k=100, and groups can be sex (69/31) or age (55/45). The outcome attribute is the calculated schufa score, and the truth attribute can be the amount of time before they default on a loan in the future? The amount on which they default? Unfortunately, this future data is not readily available to us.SAT Score
This is another datset from FA*IR. n=1.6m and k=1500, and sex is a protected attribute (53/47). The outcome attribute is SAT score and truth attribute could be college GPA? Starting salary at their first job?
Summary
We identified 11 such datasets, each of which we are unsure what the truth attribute can be. For purposes of testing, we can just choose one attribute to represent the truth. However, if we want to talk about these datasets in a paper, we want to make sure that the story is compelling and that it helps show the utility of our fairness correction methods.
In the following week, Diana and I will be working on cleaning up these datasets and fixing some of the Python scripts to better streamline our testing process. Until then!
Comments
Post a Comment