The Fault in Our Data
鈥淪iri, call Bonnie Sun.鈥
I was recently trying to get my voice assistant to call my friend. Siri responded, 鈥淪orry, I didn鈥檛 get that.鈥
I tried again. No luck. I grew frustrated. And then I realized that my friend鈥檚 last name is pronounced like sun instead of tsun (with a slight T sound at the beginning and short U sound in the middle), as I鈥檇 say in my native Chinese. Finally, in a flurry of frustration, I switched the phone鈥檚 settings from English to Mandarin鈥攃onfident that Siri would at last understand what I was saying. So I tried again 鈥 and she still didn鈥檛 understand me.
I probably wouldn鈥檛 have thought twice about this if I hadn鈥檛 noticed how accurately Siri understands my boyfriend, a native English speaker. The episode made me wonder: Why do AI systems treat languages differently? What鈥檚 missing during the development stages such that it鈥檚 incapable of recognizing people speaking fluently in distinct languages?
The answer, I鈥檝e learned, lies largely in uneven data collection.
TIME magazine recently published a , based on findings from a UNESCO report, about how voice assistants often bolster gender bias. The story underscores that 鈥渢he female voices and personalities projected onto AI technology reinforces the impression that women typically hold assistant jobs and that they should be docile and servile.鈥 And as my own aforementioned experience taught me, this bias can also be racial in nature.
Indeed, many researchers have demonstrated that by the ballooning presence of AI systems. More specifically, the development of this particular technology has tended to benefit already-advantaged individuals, bringing more convenience to their daily lives while leaving many others without access to some technical features.
The question, then, is: Where do these AI problems come from in the first place?
Many of AI鈥檚 biases are essentially invisible鈥攄ifficult to detect because they鈥檙e embedded in data and coding. As algorithms become more sophisticated and 鈥渟marter,鈥 sometimes even the programmers . And with some high-level, low-explainability algorithms, such as natural-language processing, people have more difficulty recognizing biases in an AI system.
What this in turn means is that, even though larger volumes and wider varieties of data, coupled with new techniques for analysis, are available, various data defects remain during collection, data-mining, and the processing of algorithms. In particular, there are three primary ways that data analytics can discriminate: oversampling and overrepresenting specific demographic groups, 鈥渋nheriting鈥 prejudice from past data patterns, and using proxies when selecting individuals.
In simplest terms, oversampling and overrepresentation mean that too much data is disproportionately collected from certain demographic groups. One example is Google鈥檚 facial recognition function, which labeled due to a lack of black American women鈥檚 faces in the training data. (The facial recognition application used white men鈥檚 faces to train the machine.) Think of it this way: If the training data is defective, the system can be defective. As they say in some professions, 鈥済arbage in, garbage out.鈥
Another way training data can be defective is when prejudice from previous, already-biased data sets and algorithmic patterns is transferred to future ones. Imagine that a computer-hiring software thinks that men are, on the whole, more qualified for a certain position, and then a man is selected. The computer takes this selection into consideration, and might think: Yes, men are more suitable for this role. But that might not be true. Women are likely suitable, too. The point is that algorithms can produce a vicious cycle when past defective data is baked into the new data-analytics process.
Algorithmic use of proxies is a source of blatant unfairness, too. For instance, zip codes and neighborhoods are as proxies for race; the reputation of the school someone graduated from is often used as a proxy for job qualifications. Proxies represent a straightforward and cheap way to predict future outcomes, but they鈥檙e frequently predicated on present, troublesome realities鈥攍ike high-interest loans for people living in certain areas, or higher risk-assessment scores for people who don鈥檛 have a college degree鈥攁nd, consequently, they tend to replicate a variety of inequalities in the future.
Because humans create algorithms, the biases of the algorithms largely reflect the flaws of their creators and the environments they live in. People speaking other languages are systematically underrepresented and disadvantaged in home-assistance systems largely because the decisions made by Siri programmers favor native English speakers. According to the , while AI is taught to recognize different accents, 鈥渢oo many of the people training, testing and working with the systems all sound the same.鈥 In the end, the more common 鈥渂roadcast English,鈥 which is the 鈥減redominantly white, nonimmigrant, non-regional dialect of TV newscasters,鈥 is more likely to be understood. Siri, in short, has essentially excluded many non-native English speakers from using their native language for voice commands. (Also excluded from the voice-assistant boom are .)
Use of AI is becoming increasingly common in contexts beyond home assistants鈥攖hink of its prevalence for criminal justice, credit scoring, cybersecurity, automated vehicles, and financial services. Yet even these applications leave out or harm some marginalized groups.
In , the risk assessment scores calculated by an AI system鈥攚hich was developed by the private firm, NorthPointe鈥攁re used in court. The risk assessment score predicts the likelihood that a person will commit a crime in the future (), and thus is used to decide jail time in the justice system. Yet a ProPublica study showed that the computer software systematically gives higher risk scores to black Americans than to similarly-positioned white Americans. This goes against the original ideal of using AI to improve human welfare, and to make life easier and more efficient.
So, how to chart a more equitable path forward?
The solution isn鈥檛 to 鈥渢urn off鈥 certain features or stymie technological development simply to avoid being called biased, as Google did in an effort to fix its racist algorithm. (It the gorilla category from Google Photo, without actually fixing the deeper issue.)
Rather, to make meaningful moves toward preventing AI systems from reproducing inequality, programmers, policymakers, and users more broadly ought to be aware of the possible biases of the system, the causes of these biases, and the attendant risks for certain groups.
The solution, in other words, lies in building awareness of the data fed into AI systems鈥攐f the subtle ways datasets can entrench bias in algorithms.