The fight against hidden biases in machine learning algorithms is being led by three Yale scientists and their new training regime for predictive programs.
Ryan Chiao, senior photographer
Three Yale scientists are tasked with producing objective machine learning algorithms from inherently biased training data.
In the modern world, questions of who will repay their loans and who should qualify for insurance are increasingly resolved by computer programs. These algorithms are used on the assumption that they are impartial; However, according to Amin Karbasi, professor of electrical engineering and computer science, and Ehsan Kazem, former Yale postdoctoral fellow, biases are often rooted in machine learning programs through training methods and data. But now researchers at the Yale School of Management have devised a new ‘train then mask’ technique for supervised learning to remove these algorithm biases and ensure that computers don’t repeat patterns of discrimination. societal.
“I don’t think there is a one-size-fits-all solution for [implicit bias in algorithms] at this point, ”said marketing professor Soheil Ghili, one of three researchers who pioneered this new technique. “There may be several solutions depending on your primary goal. The goal here is to reduce the disparity in treatment between different genders or racial groups while maintaining the accuracy of your prediction as far as possible and, more importantly, while maintaining the characteristic that two individuals who are otherwise identical, what I mean elsewhere except for their sensitive features, will be treated on an equal footing.
Kazemi, a current Google researcher, explained that implicit biases can often creep into algorithms even if they aren’t explicitly programmed. As an example, he indicated height and sex. Because males are often taller than females, if an algorithm is trained to prioritize height, it can implicitly prioritize males over females as well.
Complications arise when trying to eliminate discrimination by controlling the information supplied to an algorithm. Although gender is not used in training the model, it is incorporated into the importance of height for the final prediction. This means that although gender is not explicitly present, the algorithm can still discriminate against gender if its training data is biased in that direction, Kazemi explained.
The pioneers of the new ‘train then hide’ technique – Ghili, Kazemi and Karbasi – argue that the solution to removing bias in algorithms is not to remove classifications such as gender, but rather to include them initially.
“Let’s say you want to rank basketball players. Kazemi told News. “Even if you don’t want to favor men over women, size matters. You want to keep this functionality. You keep the explicit characteristic (gender) in the model when you train a model, so that you make sure that the other part, the height contribution, comes from the importance of the height itself, not the importance of the height itself, not the height. historical discrimination or act as a proxy for gender. [Then] when running the model, you assume that all people are the same sex, so there is no discrimination. “
The “Train then Mask” prediction algorithms are carried out in two stages. First, they are trained with all the data, including sensitive features, to limit the formation of implicit biases. Then, during execution, sensitive features are masked to ensure that discrimination does not play a role in decision making. Essentially, each individual is run with the same race and gender and then their true characteristics determine the outcome.
This means that the “train then mask” algorithm is fair to each individual. If two loan seekers with identical characteristics except race enter a bank, they will be given identical reliability scores and both will be accepted or both refused.
The definition of fairness “train then hide” does not necessarily represent all definitions of fairness. Some believe that specific groups should be given priority, as in the case of affirmative action, in order to combat societal discrimination. However, others fear that prioritizing historically marginalized racial groups in algorithms could potentially lead to legal disputes.
The definition of fairness continues to evolve from individual to individual and from situation to situation, which is why Ghili says there is no “one size fits all” solution to biases in algorithms at this point.
“There is a trade-off between precision and fairness,” Karbasi said. “If you want to be very specific about your predictions you have to use all the features, but sometimes that is not the right thing to do because then you will add a bias towards a group of people.”
While some precision is abandoned in favor of fairness through the ‘train then mask’ technique, the method has minimal impact on the success rate, according to data reported in a Yale study from November 2018. Scientists used their “train then mask” on real-world data to make three predictions: an individual’s income status, the reliability of a credit card applicant, and whether a criminal will reoffend.
The results were promising. In predicting income status, the unconstrained algorithm that received all data points was correct 82.5% of the time. In contrast, the “train then mask” algorithm was correct 82.3% of the time and without implied bias.
A recent survey in “The Markup” found that with conventional mortgage approval algorithms, black loan applicants are 80% more likely to be rejected than similar white applicants. “Train then mask” algorithms can help make the industry fairer, Karbasi said.
According to Karbasi, the future of train-then-mask algorithms is bright.
“For simple models, this is probably the best thing you can do,” he said.
An article describing the technique, “Eliminate Latent Discrimination: Train Then Mask,” was published in 2019 in the Proceedings of the AAAI Conference on Artificial Intelligence.