AI, Translation And The Holy Grail Of "Natural Language"

PARIS — When asked about advances in language management through artificial intelligence, Douglas Eck suggests pressing the “subtitle” button on Meet, the video conferencing service used for the interview, because of the COVID-19 pandemic. The words of this American engineer, who had come to Paris to work at Google’s French headquarters, were then displayed in writing, live and without error, under the window where we see him, headset on. This innovation, unthinkable until recently, is also available for most videos om YouTube, the Google subsidiary. Or on the dictaphone of its latest phones, which offers to automatically transcribe all audio recordings.

These new possibilities are just one example of the progress made in recent years in natural language processing by digital companies, especially giants such as Google, Apple, Facebook and Amazon (GAFA). Some of these innovations are already being put into practice. Others are in the research stage, showcased at annual developer conferences, such as Google I/O (which took place May 18-20) and Facebook F8 (June 2).

Artificial intelligence generates 20 billion translations per day on Facebook

In the crucial area of translation, services such as Google Translate, which has expanded its offer to 104 languages, or the German competitor DeepL now make it possible to translate entire paragraphs in a coherent and fluid manner. Thanks to these advances, Google offers to translate the subtitles of YouTube videos.

Facebook has also come a long way. Artificial intelligence generates 20 billion translations per day on the social network (several dozen languages, including Wolof, are available), compared to only six billion in 2019.

Yann LeCun, Facebook’s chief artificial intelligence scientist and a pioneer in the field, says, “This area is very important for Facebook. And we know that simultaneous translations in real time will be possible.”

The dream of a machine translating live conversations is within reach. Google Translate comes close, but with a slight delay: You can speak in a language and have the other person hear or read the translation via a smartphone, and even listen to their translated response through headphones, if they are the latest in-house models.

The barriers between text and image are disappearing. With the augmented reality application Google Lens, students can scan a page from a textbook or a handwritten sentence with their smartphone and translate it or get additional information online. A tourist can understand a sign or a menu or get information about a monument.

It’s all because software has learned to recognize subjects in images. Tomorrow, we could launch a search with a photo, Google believes. The American company OpenAI is exploring the creation of images from a text description. Its DALL-E prototype offers disturbing representations of invented objects: an alarm clock in the shape of a peach, a pig lamp…

These innovations help make digital technology more accessible to the disabled and illiterate. With the French National Institute for Research in Digital Science and Technology (Inria), Facebook is studying the simplification of forms, with pictograms and synonyms. In January, the company presented an automatic image description tool for the blind and visually impaired. Google has a voice recognition project for people with speech difficulties, called “Euphonia.”

Now, artificial intelligence is capturing more complex sentences than before. Amazon claims that its voice assistant Alexa has, by 2020, learned to understand more variations around simple dialogue, ask questions about unknown words and even anticipate a user’s “intent” — and suggest a timer, if they ask for tea brewing time. The Google search engine answers questions like: “Where does the Seine begin?” or “What political party is the newspaper Libération from?”

Pandu Nayak, vice president of search at Google, says, “We could, in the long run, handle queries with complex intentions.” For example: “I’ve already climbed Mount Adams, and I want to climb Mount Fuji, how do I prepare?” The answer would be broken down into sub-queries, with links to training tutorials, gear, maps, videos or content translated from Japanese, although this work remains “very conceptual.”

This wave of innovation is enabled by recent scientific breakthroughs in machine learning or deep learning. This technology competes with humans in the game of Go or image recognition. Its principle is to adapt the billions of parameters of a program in order to propose the best association between a set of known “questions” and “answers.” In 2017, Google invented a new way to organize them to improve machine translations. Called “Transformer,” it was quickly adopted by Facebook, the Chinese Baidu, the French-American Systran and the German DeepL.

“The last time there was such a breakthrough was five years ago, with long short-term memory [LSTM] architectures, used in voice assistants,” says Douglas Eck, the Google engineer. In 2018 “self-supervised learning” was added to this breakthrough: Google showed that a Transformer could do without human supervision to “learn” a language. Until now, however, to find the right value for the program’s parameters, software needed vast databases annotated by humans.

The system, named “BERT” (Bidirectional Encoder Representations from Transformers), “learns” to fill in blank sentences, then proves to be excellent in grammar exercises, questions and answers. It inspired Facebook’s system, named “Roberta;” then OpenAI’s GPT-3, with its 15 billion parameters and 500 billion words ingested (100 times the English version of Wikipedia); and the Chinese system Wu Dao 2.0, which is already 10 times larger.

Thomas Wolf, co-founder of Hugging Face, a company specializing in the distribution of these models, says “Faced with too much information and text, we will increasingly need these language models to find our way around.”

The prospects are promising, but also dizzying because these technologies will be used in headphones, in homes, in cars. The concerns have been gathered in an article co-authored by Timnit Gebru and Margaret Mitchell, two researchers in ethics whose dismissal by Google has caused controversy. The main concern is about the “biases” — racist, sexist, homophobic — that these softwares can reproduce, or even amplify, after training on masses of texts from the internet.

For example, chatbots can slide toward conspiracy themes. In response, Google says it excludes “offensive” parts of the web, such as certain forums, from its training data. “As these systems grow, it’s our responsibility to make sure they stay fair,” says Eck. Although he prefers to look for solutions “on a case-by-case basis” depending on usage, rather than trying to correct all biases in the datasets. Others, including the international collective Big Science, want to use a better documented and less biased body of text.

The main concern is about the “biases” — racist, sexist, homophobic — that these softwares can reproduce.

Another limitation of such software is its focus on the most used languages on the internet, which makes it less effective on “low-resource” languages, due to a lack of training data. Digital giants are trying to mitigate this imbalance. In October 2020, Facebook presented a software capable of translating 100 languages, without going through English, which is currently mandatory.

The “big language models” are also denounced for their gigantism. Gebru and Mitchell point out an environmental risk related to the energy consumption involved, even if the figures are debated. The authors, who describe current software as “stochastic parrots,” say that investments should be made in less data-intensive models.

All agree that these systems do not really “understand” the language. “The results are sometimes bluffing, but we also see that the generated texts end up containing errors that are easy to see. These systems have no common sense or knowledge of the world, unlike children,” says Yann LeCun, Facebook’s chief AI scientist, who is looking for ways to improve.

In the meantime, the impressive growth of computer-assisted language will be accompanied by increasing questions. Discussions around social issues — will children continue to learn foreign languages — will be joined by regulatory debates. Like any complex algorithm, these software programs will have to be made more transparent and understandable. We can anticipate questions of responsibility in case of error: A Palestinian man was arrested because of a mistranslation in a Facebook post, Gebru points out.

The automatic moderation of content by algorithms can also infringe on freedom of expression and is the subject of regulation projects. In its rules on artificial intelligence adopted in April, the European Commission proposes to adapt the framework according to the level of risk involved. For example, it recommends that internet users be informed when they are conversing online with a software program and not with… a human.

Hannah Steinkopf-Frank