国产视频

In Short

The Complicated Decisions That Come With Digitizing Indigenous Languages

Languages
Shutterstock

This article in , a collaboration among , , and .

Imagine: Hundreds of years of environmental, social, and political changes have left the English language with only a handful of speakers. Linguists and community members are trying to rebuild, but nearly the entirety of the written and audio record is gone. All that remains are 8,000 tweets from the year 2019, audio recordings of African American Vernacular and Irish English, a 1923 edition of the Oxford English Dictionary, and nine Shakespeare plays.

How do you measure what鈥檚 missing? How do you bring English back?

This, of course, is the reality for scores of Native North American languages. When Europeans first made contact with tribes across the continent, were being spoken. Today, after centuries of forced relocations, broken treaties, abusive residential schools, and other discriminatory practices, only 256 languages are spoken. A full 199 are endangered, according to the Catalogue of Endangered Languages. Yet even after everything those communities endured, they鈥檙e fighting for their words鈥攁nd the ability to protect them. New technology like smartphone keyboards, language-learning apps, and digital databases makes revitalization work easier than ever, but it also requires hard conversations about which parts of a language must be kept offline.

At a recent conference called , held at Miami University of Ohio, participants explored the possibilities and pitfalls of archival databases. For the people involved in language revitalization work, whether they鈥檙e awakening a dormant language (one that no longer has fluent speakers) or trying to prevent an existing language from losing all its speakers, one of the major obstacles is a lack of digital resources.

Jerome Viles experienced that firsthand. An enrolled member of the Confederated Tribes of Siletz Indians and part of the Southwest Oregon Dene Languages Project team, Viles helps organize archival materials produced by linguists and community members over the past 100 years. First, his community had to find the documents, which were scattered in institutions across the country. But the other problem was organizing and reconciling them. For instance, some records of language may have been written by French Jesuits who came to North America to convert Native American people. But even if three Jesuits worked with the same tribe, they could all have different spelling systems. Then there are audio recordings from the 1800s and 1900s, plus ethnographic material collected by anthropologists.

Even having these kinds of conversations is proof of how much has changed in the past decade.

What Viles鈥 group and many others needed was a way to collect and compare thousands of words across multiple generations and several dialects. And that鈥檚 where the Indigenous Language Digital Archive came in. ILDA is a database built by engineers at Miami University in collaboration with the Miami Tribe of Oklahoma (which built an earlier form of the database specifically for its own language revitalization work). At the Breath of Life 2.0 conference, participants learned how to navigate the software and move their documents into one big digital space. By the end of the five days, a few surprising discoveries had emerged. Community members from the Northern Paiute, working on the Numu language, were tickled to find seven different ways to say 鈥渙ur husband.鈥

鈥淚 was getting a kick out of that,鈥 said Nicholas Cortez. 鈥淚 was like, where鈥檚 this 鈥榦ur鈥 coming from?鈥

Each group gained unique insights from using the database, but all were united in one concern: protecting language documents from people outside the community. As documents are entered into ILDA, a dictionary is created from that information. But what if the files contained sensitive information? For some groups, it was words and stories that should only appear at certain times of the year. For others, it was medical information that could be misused if the reader didn鈥檛 fully understand it.

Mark Pearson, who works on the technology side of the Osage Language Department, says even putting up videos of Osage on YouTube was debated, because not all knowledge of the language is meant for the general public. (Understandably, he did not want to give me examples of this.) At a recent gathering of First Nations in Canada, , 鈥淲e can be colonized through data. We need to be aware of that, and we need to take steps to make sure we鈥檙e not.鈥 Keegan explained that Google Translate has continued to change how it interprets M膩ori phrases over the past 10 years, and he鈥檚 not sure those changes are always for the better, since the system automatically collects data instead of working with the community.

The Google engineering team is still working on security features that would allow a user to make some information private for a specific length of time or make it visible only to certain people. Daryl Baldwin鈥攐ne of the project directors for the Breath of Life conference and a member of the Miami Tribe鈥攁nd his team have grappled with how to handle documents that contain stories. They held onto a collection for 20 years before finally releasing it to the public, because they wanted to be sure community members were in a place to understand them as 鈥渘ot bedtime stories.鈥 Today, many of the stories are only told at certain times of the year to respect the tribe鈥檚 storytelling tradition.

Despite these complicated decisions, those involved with ILDA and other projects believe it is important to make information available to community members hoping to learn their languages. The Osage Nation, today located in Oklahoma, includes about 20,000 members, some of whom live far from the tribe headquarters. With no more fluent speakers left, figuring out how to share materials with people outside Oklahoma was once a major challenge. They鈥檇 tried video conferences but were looking to expand.

Pearson suggested building language apps. The tricky part was getting the Osage script recognized. Like the Cherokee, who developed a syllabary for their language , the Osages use their own alphabet. Pearson says that alphabet was a necessary tool for learning Osage, since sounds exist in that language that don鈥檛 correspond to English.

But getting that alphabet into software was its own challenge. Unicode is the standard for representing text. When a font isn鈥檛 supported by Unicode, the characters appear as empty white boxes (known as tofu). For a while, the Osages could only use images of their script for software鈥攁 rough workaround that didn鈥檛 give them much access to digital tools. But then, Craig Cornelius, who works in the international engineering department at Google, learned about Osage through his work with the Cherokee. Over the past few years, he鈥檚 helped the Osages have their font accepted by the Unicode Consortium and make keyboards featuring it.

鈥淚n many cases, Google being involved has been a catalyst,鈥 Cornelius says. He adds that other tech companies like Microsoft see the work being done and decide to hop on board with their own software, as with the . Today, the Osages have two language apps available to community members anywhere in the world, and they can use the Osage keyboard on their phones, as long as they have a recent-enough model of phone.

In Cornelius鈥 work at least, Google lets communities take the lead when deciding what to put into the world. 鈥淢any Native American groups have been victimized by the larger society and are understandably wary,鈥 he says.

Those questions of privacy and security will vary from one community to the next and across the different databases and apps they use. It might mean holding back on releasing some pieces of the language. But even having these kinds of conversations is proof of how much has changed in the past decade.

鈥淭he world that our community鈥檚 language was used in has been under attack for 160 years, and I鈥檇 like to see that world rebuilt,鈥 Viles says. 鈥淥ur languages are for us to speak.鈥

More 国产视频 the Authors

Lorraine Boissoneault
The Complicated Decisions That Come With Digitizing Indigenous Languages