Saving Languages From Extinction, With The Help Of AI

The world's linguistic heritage is facing a crisis just as serious as that of biodiversity. A French project is trying to save what exists in the Pangloss collection, powered by new tools of Artificial Intelligence.

Celebrating Mother Day Language in Bangladesh
Celebrating Mother Day Language in Bangladesh
Yann Verdo

PARIS — Let's begin with a little quiz: Across the earth, there are 7 continents and 197 countries. How many languages are spoken?

The answer is around 7,000, but if this number surprises you, it's because you suffer from the distorted perspective that half of the 7.8 billion inhabitants of the planet express themselves or communicate through only about 20 of them (Arabic, English, Spanish, French, Hindi, Mandarin, Portuguese...), while the other 97% of these 7,000 languages have a total number of speakers that does not exceed 4% of the population.

Our world linguistic heritage, as rich it may be, is very fragile. The overwhelming majority of these 7,000 languages have no written tradition, and today are only spoken by a handful of old people. This heritage is both the fruit and the guarantor of humans' cultural diversity, and is no less significant than the biodiversity of plant and animal species. The crisis it faces can be considered the sixth major extinction that threatens the world.

"We estimate that 50% of the 7,000 languages will disappear by the end of this century, a rate to be compared with the 26% of mammal species or 14% of bird species threatened with extinction according to the International Union for Conservation of Nature," says Evangelia Adamou, a linguist at the CNRS laboratory, LACITO (Languages and Civilizations with an Oral Tradition).

The collection is to linguistic diversity what protected areas are to biodiversity.

This threat of massive linguistic extinction is what motivated researchers to create the Pangloss collection in 1995, named after a character in Voltaire's "Candide," whose name in Greek means, "all languages." Equipped with a website making it accessible to the general public, this collection is to linguistic diversity what protected areas are to biodiversity. Its sound library has been enriched over the years and now contains more than 3,600 audio or video recordings in 170 languages, nearly half of which are transcribed and annotated.

According to Alexis Michaud, one of the main linguistic contributors to the Pangloss collection, the painstaking work of transcribing and translating a rare language before it disappears into oblivion will soon be greatly accelerated by the advancements made in Artificial Intelligence. A quarter of a century ago, automatic language processing technology produced poor results even for common languages; whereas, now it works efficiently even for the rarest and least well-documented languages.

Inscription in Aramaic on a funerary stele — Photo: Wikipedia

These advances are evidenced by Elpis (named after the Greek goddess of hope), a machine learning software developed by an Australian doctoral student, designed to enable language workers to build their own speech recognition models and automatically transcribe audio. It will be released later this year on LACITO to all researchers interested in the 780 hours of pre-recorded readings from Pangloss (As part of an open-sourced science approach, the creators have licensed most content under a Creative Commons license).

"Until now, it took at least 100 hours of recording time to train AI to make transcriptions in a new language," explained Michaud. "With the Elpis interface, one hour of recording will suffice. It's a real revolution!"

This conservation work is an urgent task, in order to save as many languages as possible from joining Gaul, Aramaic, and Grossevier (an Algonquian language of the great plains of the United States) in the cemetery of dead languages. And as with biodiversity, the phenomenon of language extinction is accelerating rapidly.

When a language dies, a whole culture dies with it.

"The number of known languages that have become extinct in the course of history is estimated at 900. But, of these, nearly a quarter have disappeared over the last 50 years," points out CNRS linguist Adamou. According to the most recent data, a language disappears, on average, every few months under the combined weight of urbanization, deforestation, and global warming.

When a language dies, a whole culture dies with it, thus closing a unique understanding of the world. As early as the 1930s, Edward Sapir and Benjamin Lee Whorf, two American linguists and anthropologists, postulated the so-called "Sapir-Whorf hypothesis," which argues that our cognitive perceptions depend on our linguistic groupings; in other words, the way we see the world is dependent on the language we speak.

There has since been a considerable variety of empirical research conducted at the crossroads of linguistics and neuroscience to test this hypothesis. Interpretations of the research are still being debated, but Adamou says one thing is certain: "Not all languages encode all aspects of reality in the same way." That means linguistic diversity is itself of enormous value.

Keep up with the world. Break out of the bubble.
Sign up to our expressly international daily newsletter!

A Mother In Spain Denied Child Custody Because She Lives In Rural Area

A court in Spain usurps custody of the one-year-old boy living with his mother in the "deep" part of the Galicia region, forced to instead live with his father in the southern city of Marbella, which the judge says is "cosmopolitan" with good schools and medical care. Women's rights groups have taken up the mother's case.

A child in Galician countryside

Laure Gautherin

A Spanish court has ordered the withdrawal of a mother's custody of her one-year-old boy because she is living in the countryside in northwestern Spain, where the judge says the child won't have "opportunities for the proper development of his personality."

The case, reported Monday in La Voz de Galicia, has sparked outrage from a women's rights association but has also set off reactions from politicians of different stripes across the province of Galicia, defending the values of rural life.

Judge María Belén Ureña Carazo, of the family court of Marbella, a city on the southern coast of 141,000 people, has ordered the toddler to stay with father who lives in the city rather than with his mother because she was living in "deep Galicia" where the child would lack opportunities to "grow up in a happy environment."

Front page of La Voz de Galicia - October 25, 2021

Front page of La Voz de Galicia - Monday 25 October, 2021

La Voz de Galicia

Better in a "cosmopolitan" city?

The judge said Marbella, where the father lives, was a "cosmopolitan city" with "a good hospital" as well as "all kinds of schools" and thus provided a better environment for the child to thrive.

The mother has submitted a formal complaint to the General Council of the Judiciary that the family court magistrate had acted with "absolute contempt," her lawyer told La Voz de Galicia.

The mother quickly accumulated support from local politicians and civic organizations. The Clara Campoamor association described the judge's arguments as offensive, intolerable and typical of "an ignorant person who has not traveled much."

The Xunta de Galicia, the regional government, has addressed the case, saying that any place in Galicia meets the conditions to educate a minor. The Socialist party politician Pablo Arangüena tweeted that "it would not hurt part of the judiciary to spend a summer in Galicia."

Keep up with the world. Break out of the bubble.
Sign up to our expressly international daily newsletter!