国产视频

In Short

The Public Interest Potential of Natural Language Processing

weibo
Shutterstock.com / rafapress

This story is part of PIT UNiverse, a monthly newsletter from PIT-UN that shares news and events from around the Network. Subscribe to PIT UNiverse here.

One of the greatest challenges facing internet companies and civil society is the rampant spread of disinformation and hate speech on social media platforms worldwide. Part of the problem is a lack of information鈥攊t鈥檚 hard to chart the viral spread of specific racist ideas and hateful content amid the sea of noise on social media platforms with millions, if not billions, of users.

Understanding the scope of online hate speech and disinformation is one promising application of Natural Language Processing, or NLP, the field of artificial intelligence that enables computers to understand and generate human language.

Yulia Tsvetkov is one of the people who is leveraging this technology to make a change. Tsvetkov, a 2019 PIT-UN Network Challenge grantee, is an assistant professor in the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. Earlier this year, Tsvetkov鈥檚 lab partnered with the Washington Post Fact Checker to track the spread of anti-Black hate speech on social media in Guangzhou, China. The work stemmed from the fact that this year, from late March to early April, unfounded fears that Africans were at high risk of spreading COVID-19 prompted a rash of anti-black discrimination from Guangzhou authorities and businesses.

Tsvetkov calls NLP a 鈥減erfect use case of public interest technology,鈥 because it has exciting potential for use in the public interest, but says it also carries significant ethical and privacy risks if used carelessly.

鈥淣LP develops algorithms that process human language, and humans are inherently biased,鈥 she says. 鈥淢achine learning algorithms are very good at picking up on those patterns in language, and they learn to absorb and reinforce human biases. As a consequence, naively-built language technologies that do not explicitly address those risks often exhibit undesirable behaviors, potentially with catastrophic consequences. My lab and students in our course, along with many other researchers, are focusing on detecting and mitigating those risks.鈥

In terms of what these technologies can accomplish, Tsvetkov adds, 鈥淚 think the biggest promise of language technologies is that they can serve internet users all over the world in their daily tasks involving language and communication. They can provide interfaces for users for accessing education and knowledge on the web, for finding social connections, employment, and friendships. They can have a huge impact, since we all communicate using language.鈥

The collaboration between Tsvetkov鈥檚 lab and the Washington Post focused on identifying discriminatory sentiments directed against the African population in Guangzhou. The team collected more than 200,000 posts from the social network Weibo, using NLP to analyze the sentiments expressed in the posts at scale. As a result, they were able to track the rise of xenophobic and discriminatory language throughout April, 2019.

While sentiment analysis is a ubiquitous application of NLP, Tsvetkov says it is typically used by for-profit companies to better monetize their products. Turning this tool towards identifying hateful and discriminatory speech is a strong example of NLP鈥檚 potential for public interest use.

鈥淲hile sentiment analysis alone cannot address the problems of hate speech, misinformation, and disinformation,鈥 Tsvetkov stresses, 鈥渋ts underlying algorithms can be adapted and used to surface problematic, uncivil, and potentially dangerous interactions on social media. These tools can be used to alleviate the mental load of human moderators on social media platforms, or to automatically detect and analyze problematic patterns of communication.鈥

In addition to her research, Tsvetkov, along with Prof. Alan Black, teaches CMU鈥檚 Computational Ethics for NLP class, which teaches students how to avoid ethical pitfalls and mitigate the risks and social biases in AI tools, as well as how to build language technologies for social good. Expanding the Computational Ethics course was the focus of her 2019 PIT-UN Network Challenge grant.

鈥淭he key goals of the course are to equip future technologists with theoretical and practical tools to combat social biases in language technologies, and to develop new techniques鈥攊nformed by ethics, social science, and law鈥攖hat are civic minded, that serve diverse populations equitably, and promote public good,鈥 Tsvetkov says. She notes the course received very positive feedback from students and attracted many from underrepresented backgrounds. Many students have continued to engage with the course topics after the class ended鈥攁t least eight research papers resulted from students鈥 work in class.

Looking ahead, Tsvetkov is wary of the myriad ethical and privacy risks posed by artificial intelligence, including NLP, and wants to see the field move towards addressing them.

鈥淲e can learn a lot about a user through language analysis algorithms,鈥 she says. 鈥淓specially when we analyse people鈥檚 communications across time and across their social networks. Personalization algorithms use this property of language to improve services such as search or targeted advertising. But the same algorithms can be used to track users, and to manipulate public opinion through targeted analysis of user鈥檚 feeds and through content personalization.鈥

鈥淭here's so much focus on "fake news" today, but I think much more danger comes from such subtle manipulation strategies,鈥 she adds. 鈥淚 hope more discussion in our field will be focused on developing NLP algorithms that identify and prevent subtle manipulation strategies like agenda setting and polarization.鈥

More 国产视频 the Authors

Austin_Adams.jpg
Austin Adams

Communications Manager, Open Technology Institute

The Public Interest Potential of Natural Language Processing