Stylometry
Stylometry, or analysis of countable linguistic features of (literary) texts, is usually associated with authorship attribution or, more sensationally, with the sole purpose of prosecuting plagiarists. However, recent studies show that the same methods that help attribute an ancient Greek text to Plato or put a plagiarist behind bars for forging an alleged play by Shakespeare can be used in a much broader context of literary study. Patterns of stylometric similarity and difference provide new insights into relationships between different books by the same author; between books by different authors, between authors differing in terms of chronology or gender; between translations of the same author or group of authors; helping, in turn, to find new ways of looking at works that seem to have been studied from all possible perspectives.
The workshop, split into two 18-hour blocks, will try to address some of the following research questions: What is common in the language we use and what is related to cultural contexts and/or writer's individuality? What elements of style are affected by literary period, genre, topic? What is unconsciously incorporated by the author and reflects his / her education, gender, religious background, social or historical conditions? Last but not least, which features of a written text can betray the person who wrote it despite his / her aesthetic, social, or historical conditions?
The first and less advanced block, From Nothing to Networks, will acquaint the participants with 'stylo', the suite of stylometric tools produced by the workshop leaders, and with Gephi, a tool for network analysis.
The second block, From Networks to Nearest Shrunken Centroids, will introduce some more advanced problems of classification and validation, and more advanced methods of statistical analysis of texts.
Both blocks will focus on tools based in the R statistical programming environment with user-friendly interfaces, so no expert knowledge of R in particular, or of programming in general is required for Block 1. The texts used for the workshops will be provided by the instructors and the participants are encouraged to bring their own; if necessary, the participants' individual corpora will be expanded as needed and as available (online or elsewhere). The texts will be literary, multilingual, and include both originals and translations.
Block 2 will be building on the skills acquired in Block 1, but a catching-up session is planned for participants with some R and Gephi knowledge.
2022
2021
2020
2019
- Schedule
- Workshops
- XML-TEI document encoding, structuring, rendering and transformation
- Hands on Humanities Data Workshop - Creation, Discovery and Analysis
- Manuscripts in the Digital Age: XML-Based Catalogues and Editions
- Digital Annotation and Analysis of Literary Texts with CATMA 6.0
- Compilation, Annotation and Analysis of Written Text Corpora. Introduction to Methods and Tools
- Searching Linguistic Patterns in Text Corpora for Digital Humanities Research
- All About Data – Exploratory Data Modelling and Practical Database Access
- Stylometrie
- Humanities Data and Mapping Environments
- Images of Image Machines. Theory and Practice of Interpretable Machine Learning for the Digitial Humanities
- An Introduction to Neural Networks for Natural Language Processing - Applications and Implementation
- Lectures (public)
- Projects (public)
- Poster Session (public)
- Panel (public)
- Teasers (public)
- Cultural programme
- Experts
- Lecturers
- Scientific Committee
- Important dates (new)
- Application
- Scholarships (updated)
- Participation fees
- Refund policy
- T-Shirts
- Child care
- Birthday thoughts