When the world gets closer.

We help you see farther.

Sign up to our expressly international daily newsletter.

Geopolitics

Saving Languages From Extinction, With The Help Of AI

The world's linguistic heritage is facing a crisis just as serious as that of biodiversity. A French project is trying to save what exists in the Pangloss collection, powered by new tools of Artificial Intelligence.

Celebrating Mother Day Language in Bangladesh
Celebrating Mother Day Language in Bangladesh
Yann Verdo

PARIS — Let's begin with a little quiz: Across the earth, there are 7 continents and 197 countries. How many languages are spoken?

The answer is around 7,000, but if this number surprises you, it's because you suffer from the distorted perspective that half of the 7.8 billion inhabitants of the planet express themselves or communicate through only about 20 of them (Arabic, English, Spanish, French, Hindi, Mandarin, Portuguese...), while the other 97% of these 7,000 languages have a total number of speakers that does not exceed 4% of the population.

Our world linguistic heritage, as rich it may be, is very fragile. The overwhelming majority of these 7,000 languages have no written tradition, and today are only spoken by a handful of old people. This heritage is both the fruit and the guarantor of humans' cultural diversity, and is no less significant than the biodiversity of plant and animal species. The crisis it faces can be considered the sixth major extinction that threatens the world.

"We estimate that 50% of the 7,000 languages will disappear by the end of this century, a rate to be compared with the 26% of mammal species or 14% of bird species threatened with extinction according to the International Union for Conservation of Nature," says Evangelia Adamou, a linguist at the CNRS laboratory, LACITO (Languages and Civilizations with an Oral Tradition).

The collection is to linguistic diversity what protected areas are to biodiversity.

This threat of massive linguistic extinction is what motivated researchers to create the Pangloss collection in 1995, named after a character in Voltaire's "Candide," whose name in Greek means, "all languages." Equipped with a website making it accessible to the general public, this collection is to linguistic diversity what protected areas are to biodiversity. Its sound library has been enriched over the years and now contains more than 3,600 audio or video recordings in 170 languages, nearly half of which are transcribed and annotated.

According to Alexis Michaud, one of the main linguistic contributors to the Pangloss collection, the painstaking work of transcribing and translating a rare language before it disappears into oblivion will soon be greatly accelerated by the advancements made in Artificial Intelligence. A quarter of a century ago, automatic language processing technology produced poor results even for common languages; whereas, now it works efficiently even for the rarest and least well-documented languages.

Inscription in Aramaic on a funerary stele — Photo: Wikipedia

These advances are evidenced by Elpis (named after the Greek goddess of hope), a machine learning software developed by an Australian doctoral student, designed to enable language workers to build their own speech recognition models and automatically transcribe audio. It will be released later this year on LACITO to all researchers interested in the 780 hours of pre-recorded readings from Pangloss (As part of an open-sourced science approach, the creators have licensed most content under a Creative Commons license).

"Until now, it took at least 100 hours of recording time to train AI to make transcriptions in a new language," explained Michaud. "With the Elpis interface, one hour of recording will suffice. It's a real revolution!"

This conservation work is an urgent task, in order to save as many languages as possible from joining Gaul, Aramaic, and Grossevier (an Algonquian language of the great plains of the United States) in the cemetery of dead languages. And as with biodiversity, the phenomenon of language extinction is accelerating rapidly.

When a language dies, a whole culture dies with it.

"The number of known languages that have become extinct in the course of history is estimated at 900. But, of these, nearly a quarter have disappeared over the last 50 years," points out CNRS linguist Adamou. According to the most recent data, a language disappears, on average, every few months under the combined weight of urbanization, deforestation, and global warming.

When a language dies, a whole culture dies with it, thus closing a unique understanding of the world. As early as the 1930s, Edward Sapir and Benjamin Lee Whorf, two American linguists and anthropologists, postulated the so-called "Sapir-Whorf hypothesis," which argues that our cognitive perceptions depend on our linguistic groupings; in other words, the way we see the world is dependent on the language we speak.

There has since been a considerable variety of empirical research conducted at the crossroads of linguistics and neuroscience to test this hypothesis. Interpretations of the research are still being debated, but Adamou says one thing is certain: "Not all languages encode all aspects of reality in the same way." That means linguistic diversity is itself of enormous value.

You've reached your monthly limit of free articles.
To read the full article, please subscribe.
Get unlimited access. Support Worldcrunch's unique mission:
  • Exclusive coverage from the world's top sources, in English for the first time.
  • Insights from the widest range of perspectives, languages and countries
  • $2.90/month or $19.90/year. No hidden charges. Cancel anytime.
Already a subscriber? Log in

When the world gets closer, we help you see farther

Sign up to our expressly international daily newsletter!
Migrant Lives

How An Erdogan-Assad Truce Could Trigger A New Migrant Crisis At Europe's Border

In Turkey, resentment against Syrian refugees is growing. And President Erdogan – once their patron – is now busy seeking good relations with the man the Syrians fled, the dictator Bashar al-Assad.

A Syrian refugee working as a trash collector in Gaziantep, Turkey

Carolina Drüten

ISTANBUL — At some point, they'd simply had enough. Enough of the hostilities, the insecurity, the attacks. In a group on the messenger service Telegram, Syrians living in Turkey called for a caravan – a march to the Turkish-Greek border, and then crossing into the European Union.

Tens of thousands of users are now following updates from the group, in which the organizers are asking Syrian refugees in Arabic to equip themselves with sleeping bags, tents, life jackets, drinking water, canned food and first aid kits. The AFP news agency spoke to an organizer who wants to remain anonymous because of possible reprisals. "We will let you know when it's time to leave," said the 46-year-old Syrian engineer.

Keep reading...Show less

When the world gets closer, we help you see farther

Sign up to our expressly international daily newsletter!
You've reached your monthly limit of free articles.
To read the full article, please subscribe.
Get unlimited access. Support Worldcrunch's unique mission:
  • Exclusive coverage from the world's top sources, in English for the first time.
  • Insights from the widest range of perspectives, languages and countries
  • $2.90/month or $19.90/year. No hidden charges. Cancel anytime.
Already a subscriber? Log in
THE LATEST
FOCUS
TRENDING TOPICS

Central to the tragic absurdity of this war is the question of language. Vladimir Putin has repeated that protecting ethnic Russians and the Russian-speaking populations of Ukraine was a driving motivation for his invasion.

Yet one month on, a quick look at the map shows that many of the worst-hit cities are those where Russian is the predominant language: Kharkiv, Odesa, Kherson.

Watch VideoShow less
MOST READ