When the world gets closer.

We help you see farther.

Sign up to our expressly international daily newsletter.

Switzerland

How Big Data Helps Reveal Ghostwriters And Bust Plagiarists

Can we determine whether a certain writer actually penned a certain work? Using technological analysis, the answer is a reliable 'yes.'

Ghostwriter
Ghostwriter
Jacques Savoy

ST. GALLEN — At the beginning of this year, the Swiss universities in St. Gallen and Bern denounced the student practice of using ghostwriters to pass off work as their own.

Though universities are not yet using sophisticated technological tools to analyze student papers, the issue raises a number of questions in a host of applications. How can we identify the author of a letter, an anonymous e-mail or a contested will? Are there ways to bust plagiarists? Can we determine whether the text was written by a woman or a man? Can we detect the presence of a sexual predator in a chat? In tackling such issues, computer algorithms can provide answers whose reliability varies from 70% to 95%, depending on the type of problem and its context. Some examples:

Gary, aka Emile Ajar

In literature, authors sometimes write novels under assumed names. Romain Gary, for example, wrote under the pseudonym Emile Ajar in the 1970s. Technological tools can help to highlight similarities between two novels written under two separate names, and indicate whether a single author may have written both texts. The collaboration between writers raises the question of which parts of a work are written by whom, like in the case of the play The Two Noble Kinsmen (a collaboration between W. Shakespeare and J. Fletcher) or Psyche(P. Corneille and Molière).

Sometimes this gives rise to heated discussions. For example, several well-known plays are attributed to Molière, but stylistic studies emphasize their disturbing proximity to the writings of Pierre Corneille. As for Psyche, one can detect a collaboration between two writers, or support that these pieces are written by Pierre Corneille.

Saint Paul

In the biblical texts of the 14 epistles of St. Paul, seven are unanimously recognized as the work of St. Paul himself, four by the majority of researchers, and two remain disputed. On the other hand, researchers unanimously agree that Hebrews was not written by St. Paul. Another example is the Book of Mormon, which is attributed to Joseph Smith but remains contested.

Politicians

In politics, the use of ghostwriters raises no ethical problem, and the practice is nothing new. For example, George Washington rarely wrote his speeches, often leaving the editorial work to Alexander Hamilton or James Madison. But because the first U.S. president delivered an average of just three important speeches a year, this issue was largely insignificant. Since then, politics have changed. Modern American presidents now deliver a speech a day on average.

Analysis techniques

To determine a document's real author, several computer techniques focus on language, particularly repeated words (the, that, that), pronouns (we, you, me) or auxiliary verbs (is, are). An analysis of frequent combinations of two words then confirms a quota. Other big data strategies are based on formulations or expressions typical for a given author (like Jacques Chirac's use of the French word "abracadabrantesque" or General de Gaulle"s use of "chienlit").

The average length of sentences is also telling. The distribution of names or intensity of adjectives, pronouns and verbs can also help determine the probable author of a document. For example, Bill Clinton's style is characterized by a high frequency of pronouns while President Barack Obama uses more verbs.


Prerequisites

Of course, the use of these technical functions requires that we have the texts of all the probable authors of a document. In the case of the universities in St. Gallen and Bern, this precondition is obviously impractical and hasn't been fulfilled. By knowing texts written by an author, technology can analyze the likelihood of that person haven written another document. The conclusion can be affirmative or negative, and the success rate varies between 65% and 90%. These values are still far from those of DNA testing, but the analytical techniques become more refined each year as the use of big data increases.


Gender differences

Author profiles don't set out to determine the name of a writer, but instead to identify some of that writer's characteristics. For example, can we determine whether a text was written by a man or a woman, or the approximate age of the writer? Are there stylistic characteristics of each sex? The answer is yes. Women tend to use pronouns more frequently (I, we, you), names related to social relationships (sister, friend) and express more feelings (joy, anxiety.)

The typically masculine style is characterized by a higher frequency of determinants (the, the, of), nouns (table, computer) or the use of numbers. In the blogosphere, men are distinguished by themes related to employment, sports or technology, while women tend to tackle topics of family, friends and food through a more emotional language. Young people between ages 14 and 18 are more likely to use abbreviations ("lol"). They also tend to write shorter sentences and more frequently repeat words. In contrast, older people use longer sentences and have a richer vocabulary.

You've reached your monthly limit of free articles.
To read the full article, please subscribe.
Get unlimited access. Support Worldcrunch's unique mission:
  • Exclusive coverage from the world's top sources, in English for the first time.
  • Insights from the widest range of perspectives, languages and countries
  • $2.90/month or $19.90/year. No hidden charges. Cancel anytime.
Already a subscriber? Log in

When the world gets closer, we help you see farther

Sign up to our expressly international daily newsletter!
Geopolitics

Patronage Or Politics? What's Driving Qatar And Egypt Grand Rapprochement

For Cairo, Qatar had been part of an “axis of evil,” with anger directed at Al Jazeera, the main Qatari outlet, and others critical of Egypt after the Muslim Brotherhood ouster. But the vitriol is now gone, with the first ever visit by Egyptian President al-Sisi to Doha.

Egyptian President Abdel Fattah al-Sisi met with the Emir of Qatar in June 2022 in Cairo

Beesan Kassab, Daniel O'Connell, Ehsan Salah, Hazem Tharwat and Najih Dawoud

For the first time since coming to power in 2014, President Abdel Fattah al-Sisi traveled to Doha last month on an official visit, a capstone in a steadily building rapprochement between the two countries in the last year.

Not long ago, however, the photo-op capturing the two heads of state smiling at one another in Doha would have seemed impossible. In the wake of the Armed Forces’ ouster of the Muslim Brotherhood government in 2013, Qatar and Egypt traded barbs.

In the lexicon of the intelligence-controlled Egyptian press landscape, Qatar had been part of an “axis of evil” working to undermine Egypt’s stability. Al Jazeera, the main Qatari outlet, was banned from Egypt, but, from its social media accounts and television broadcast, it regularly published salacious and insulting details about the Egyptian administration.

But all of that vitriol is now gone.

Keep reading...Show less

When the world gets closer, we help you see farther

Sign up to our expressly international daily newsletter!
You've reached your monthly limit of free articles.
To read the full article, please subscribe.
Get unlimited access. Support Worldcrunch's unique mission:
  • Exclusive coverage from the world's top sources, in English for the first time.
  • Insights from the widest range of perspectives, languages and countries
  • $2.90/month or $19.90/year. No hidden charges. Cancel anytime.
Already a subscriber? Log in
THE LATEST
FOCUS
TRENDING TOPICS

Central to the tragic absurdity of this war is the question of language. Vladimir Putin has repeated that protecting ethnic Russians and the Russian-speaking populations of Ukraine was a driving motivation for his invasion.

Yet one month on, a quick look at the map shows that many of the worst-hit cities are those where Russian is the predominant language: Kharkiv, Odesa, Kherson.

Watch VideoShow less
MOST READ