When the world gets closer.

We help you see farther.

Sign up to our expressly international daily newsletter.


How Big Data Helps Reveal Ghostwriters And Bust Plagiarists

Can we determine whether a certain writer actually penned a certain work? Using technological analysis, the answer is a reliable 'yes.'

Jacques Savoy

ST. GALLEN — At the beginning of this year, the Swiss universities in St. Gallen and Bern denounced the student practice of using ghostwriters to pass off work as their own.

Though universities are not yet using sophisticated technological tools to analyze student papers, the issue raises a number of questions in a host of applications. How can we identify the author of a letter, an anonymous e-mail or a contested will? Are there ways to bust plagiarists? Can we determine whether the text was written by a woman or a man? Can we detect the presence of a sexual predator in a chat? In tackling such issues, computer algorithms can provide answers whose reliability varies from 70% to 95%, depending on the type of problem and its context. Some examples:

Gary, aka Emile Ajar

In literature, authors sometimes write novels under assumed names. Romain Gary, for example, wrote under the pseudonym Emile Ajar in the 1970s. Technological tools can help to highlight similarities between two novels written under two separate names, and indicate whether a single author may have written both texts. The collaboration between writers raises the question of which parts of a work are written by whom, like in the case of the play The Two Noble Kinsmen (a collaboration between W. Shakespeare and J. Fletcher) or Psyche(P. Corneille and Molière).

Sometimes this gives rise to heated discussions. For example, several well-known plays are attributed to Molière, but stylistic studies emphasize their disturbing proximity to the writings of Pierre Corneille. As for Psyche, one can detect a collaboration between two writers, or support that these pieces are written by Pierre Corneille.

Saint Paul

In the biblical texts of the 14 epistles of St. Paul, seven are unanimously recognized as the work of St. Paul himself, four by the majority of researchers, and two remain disputed. On the other hand, researchers unanimously agree that Hebrews was not written by St. Paul. Another example is the Book of Mormon, which is attributed to Joseph Smith but remains contested.


In politics, the use of ghostwriters raises no ethical problem, and the practice is nothing new. For example, George Washington rarely wrote his speeches, often leaving the editorial work to Alexander Hamilton or James Madison. But because the first U.S. president delivered an average of just three important speeches a year, this issue was largely insignificant. Since then, politics have changed. Modern American presidents now deliver a speech a day on average.

Analysis techniques

To determine a document's real author, several computer techniques focus on language, particularly repeated words (the, that, that), pronouns (we, you, me) or auxiliary verbs (is, are). An analysis of frequent combinations of two words then confirms a quota. Other big data strategies are based on formulations or expressions typical for a given author (like Jacques Chirac's use of the French word "abracadabrantesque" or General de Gaulle"s use of "chienlit").

The average length of sentences is also telling. The distribution of names or intensity of adjectives, pronouns and verbs can also help determine the probable author of a document. For example, Bill Clinton's style is characterized by a high frequency of pronouns while President Barack Obama uses more verbs.


Of course, the use of these technical functions requires that we have the texts of all the probable authors of a document. In the case of the universities in St. Gallen and Bern, this precondition is obviously impractical and hasn't been fulfilled. By knowing texts written by an author, technology can analyze the likelihood of that person haven written another document. The conclusion can be affirmative or negative, and the success rate varies between 65% and 90%. These values are still far from those of DNA testing, but the analytical techniques become more refined each year as the use of big data increases.

Gender differences

Author profiles don't set out to determine the name of a writer, but instead to identify some of that writer's characteristics. For example, can we determine whether a text was written by a man or a woman, or the approximate age of the writer? Are there stylistic characteristics of each sex? The answer is yes. Women tend to use pronouns more frequently (I, we, you), names related to social relationships (sister, friend) and express more feelings (joy, anxiety.)

The typically masculine style is characterized by a higher frequency of determinants (the, the, of), nouns (table, computer) or the use of numbers. In the blogosphere, men are distinguished by themes related to employment, sports or technology, while women tend to tackle topics of family, friends and food through a more emotional language. Young people between ages 14 and 18 are more likely to use abbreviations ("lol"). They also tend to write shorter sentences and more frequently repeat words. In contrast, older people use longer sentences and have a richer vocabulary.

You've reached your limit of free articles.

To read the full story, start your free trial today.

Get unlimited access. Cancel anytime.

Exclusive coverage from the world's top sources, in English for the first time.

Insights from the widest range of perspectives, languages and countries.


It's A Golden Era For Russia-Turkey Relations — Just Look At The Numbers

On the diplomatic and political level, no world leader speaks more regularly with Vladimir Putin than his Turkish counterpart Recep Tayyip Erdoğan. But the growing closeness of Russia and Turkey can also be measured in the economic data. And the 2022 numbers are stunning.

Photo of Erdogan and Putin walking out of a door

Erdogan and Putin last summer in Sochi, Russia

Vyacheslav Prokofyev/TASS via ZUMA
Aytug Özçolak


ISTANBUL — As Russia has become increasingly isolated since the invasion of Ukraine, the virtual pariah state has drawn notably closer to one of its remaining partners: Turkey.

Ankara has committed billions of dollars to buy the Russian S-400 surface-to-air missile system, and contracted to Russia to build Turkey's first nuclear power plant. The countries’ foreign policies are also becoming increasingly aligned.

Stay up-to-date with the latest on the Russia-Ukraine war, with our exclusive international coverage.

Sign up to our free daily newsletter.

But the depth of this relationship goes much further. Turkish president Recep Tayyip Erdoğan speaks to Russian President Vladimir Putin more than any other leader: 16 times in 2022, and 11 times in 2021. Erdoğan has visited Russia 14 times since 2016, compared to his 10 visits to the U.S. in the same time period (half of which were in 2016 and 2017).

But no less important is the way the two countries are increasingly tied together by commerce.

Keep reading...Show less

You've reached your limit of free articles.

To read the full story, start your free trial today.

Get unlimited access. Cancel anytime.

Exclusive coverage from the world's top sources, in English for the first time.

Insights from the widest range of perspectives, languages and countries.

The latest