Stylometry

  • SUMMARY: Stylometry is a descriptive science that uses statistical techniques to identify authorship of written materials.

In addition to comparing simple frequency patterns of words, stylometry focuses on the groupings of words and the position of these words in sentences. Using stylometry, scholars have tried to determine if Homer wrote the last book of the Odyssey, if the Apostle Paul wrote the Letter to the Ephesians, and if Shakespeare wrote the first act of the play The Booke of Sir Thomas Moore. Because of the successful use of stylometry, its techniques have been expanded to help identify composers from their musical compositions and analyze artists from their paintings.

94982059-91599.jpg

94982059-29928.jpg

Beginnings of Stylometry

In 1851, the English mathematician August de Morgan initiated the field of stylometry when he suggested that authors could be identified by the average number of letters in their written words. Because de Morgan’s suggestion was simplistic and often misleading, stylometry did not gain validity until 1944, when Udny Yule published his pioneering work that suggested that an author’s vocabulary usage did not depend on sample size. Analyzing Paul’s Epistles and the words of the physician Hippocrates in 1957, W. C. Wake was the first to produce an acceptable test of authorship using distributions, sampling methods, and periodic effects within distributions. In 1961, A. Q. Morton and others used computer technology to extend and verify Wake’s approach.

A specific example of scholars’ use of stylometry involves The Booke of Sir Thomas Moore, a play about a martyred Englishman in 1535. Scholars first concluded that the play was a composite effort of five authors, with handwriting analyses accepted as proof that William Shakespeare was the sole author of two of the play’s sections. Then, computer analyst Thomas Merriam created computer databases of the play in question and three other Shakespearean plays—Julius Caesar, Pericles, and Titus Andronicus.

The concordances generated for all four plays revealed significantly similar frequencies of “word habits” or repeated combinations of words and phrases. Though Merriam concluded that Shakespeare was the sole author of The Booke of Sir Thomas Moore, his stylometric data did not convince all scholars. Skeptics such as these claim that Merriam’s techniques are, at best, informative, being suspect because the three comparison plays are not the best representatives of Shakespeare’s style.

Modern Applications

Stylometry has been used in court cases to identify “fraudulent” wills and “false” criminal confessions. In the late 1970s, defense attorneys for kidnap victim and accused bank robber Patty Hearst tried to introduce stylometric evidence that “proved” the tape-recorded “communiqués” read by Hearst were not her own words.

Their evidence was based on concordances built from previous essays by Hearst, oral conversations, her confession, and materials produced by the Symbionese Liberation Army. The attorneys carefully analyzed these concordances using statistical discrimination, cluster analysis, and t-test comparisons to examine factors such as average sentence lengths, parsing patterns involving conjunctions, and linguistic habits. Despite the defense’s protests, the trial judge and the appeals court both ruled that the stylometric evidence was not admissible, and thus it was never used.

Donald Foster, a Vasser College English Professor, used stylometry to identify with 99 percent confidence the “anonymous” author of the political text Primary Colors. Though Newsweek columnist Joe Klein originally denied being the suspected author, he eventually admitted to the deed. Since that time, Foster has helped confirm Ted Kaczynski’s authorship of the Unabomb Manifesto and identify Eric Rudolph as a suspect in the 1996 Atlanta Olympics bombing.

While stylometry continues to be used to identify documents with disputed authorship, these practices have been largely digitalized in the modern era. New uses for this technology have emerged, including forensic analysis of social media posts and language proficiency analysis.

Bibliography

Juola, Patrick. “Authorship Attribution.” Foundations and Trends in Information Retrieval, vol. 1, no. 3, 2008, pp. 233-334. dx.doi.org/10.1561/1500000005. Accessed 14 Nov. 2024.

Michaelson, S., et al. “Fingerprinting the Mind.” Endeavor, vol. 3, no. 4, 1979, pp. 171-75. doi.org/10.1016/0160-9327(79)90036-X. Accessed 14 Nov. 2024.

Morton, A. Q. Literary Detection: How to Prove Authorship and Fraud in Literature and Documents. Charles Scribner’s Sons, 1979.

Roberts, David. “Don Foster Has a Way With Words.” Smithsonian, Sept. 2001, www.smithsonianmag.com/arts-culture/don-foster-has-a-way-with-words-78251537. Accessed 1 Oct. 2024.

Schwartz, Lillian. “The Art Historian’s Computer.” Scientific American, Apr. 1995, www.scientificamerican.com/article/the-art-historians-computer. Accessed 1 Oct. 2024.

"Stylometry Methods and Practices." Temple University, 22 Nov. 2023, guides.temple.edu/stylometryfordh. Accessed 1 Oct. 2024.

Yule, G. Udny. The Statistical Study of Literary Vocabulary. Cambridge University Press, 2014.