Jan Rybicki / Maciej Eder
Stylometry: Computer-Assisted Analysis of Literary Texts
Stylometry, or analysis of countable linguistic features of (literary) texts, is usually associated with authorship attribution or, more sensationally, with the sole purpose of prosecuting plagiarists. However, recent studies show that the same methods that help attribute an ancient Greek text to Plato or put a plagiarist behind bars for forging an alleged play by Shakespeare can be used in a much broader context of literary study. Patterns of stylometric similarity and difference provide new insights into relationships between different books by the same author; between books by different authors, between authors differing in terms of chronology or gender; between translations of the same author or group of authors; helping, in turn, to find new ways of looking at works that seem to have been studied from all possible perspectives.
The workshop will try to address, then, some of the following research questions: What is common in the language we use and what is related to cultural contexts and/or writer’s individuality? What elements of style are affected by literary period, genre, topic? What is unconsciously incorporated by the author and reflects his/her education, gender, religious background, social or historical conditions? Last but not least, which features of a written text can betray the person who wrote it despite his/her aesthetic, social, or historical conditions?
This will be achieved by placing at the participants’ disposal some of the more useful stylometric tools and methods, from simple wordlist-making to custom-made multivariate analyses of word and phrase frequencies, for use in their individual projects. Although some of these tools are based in the R statistical programming environment, no expert knowledge is required. The texts used for the workshops will be provided by the instructors; if necessary, the participants’ individual corpora will be expanded as needed and as available (online or elsewhere). The texts will be literary, multilingual, and include both originals and translations.