Distant Reading in R. Analyse the text & visualize the Data
1.0 The Workshop
Distant reading is one of the most famous methodological approaches that has been constantly taking place in digital humanities, since its formalisation by Franco Moretti in the article Conjectures on World Literature (2000). Distant reading benefits greatly from the use of computational tools. For this reason, we are proposing a course based on the use of R, one of the most popular programming languages used today by the scientific community.
The course is suitable for beginners who want to start digital humanities training with a complete overview of the most common tools used for distant reading.
The philosophy of the course is to analyse the text & visualize the data and the course is structured on this dichotomy.
The objective of the course is to provide the participants with methodological and practical tools that they can utilise for their own research. At the end of the two weeks, they will be able to use R and RStudio in order to apply semantic analysis, stylometry, and spatial mapping. R analysis displays results that can be easily presented by graphical representations such as graphs, trees, or maps. As a result, part of the course will be dedicated to open source programs like Gephi, Gimp and Inkscape, specific to the reworking of vectorial and graphical files.
2.0 Schedule
The course takes place over two weeks in order to allow the participants to choose to attend one or both parts. However, participation to the entire course is strongly advised.
The first week is dedicated to three of the most common methods used for distant reading: sentiment analysis, topic modelling, and stylometry. The objective of this first week is to provide a basic theoretical / methodological understanding of distant reading techniques, together with the practical tools to analyse texts in an R environment.
The second week is dedicated to data visualization. In this module the participants will focus on mapping, network analysis and graphics. The objective of this week is to give participants the tools to organise the visualisation of data graphically, chronologically, and spatially. If a participant is interested in the second week only, we will assume that s/he has a more than basic knowledge of R programming language.
Week 1: Analyse the text |
Week 2: Visualize the data |
|||||||||
Day 1 |
Day 2 |
Day 3 |
Day 4 |
Day 5 |
Day 6 |
Day 7 |
Day 8 |
Day 9 |
Day 10 |
|
1st hour |
Introduction to the course |
Sentiment analysis |
Topic modelling |
Stylometry |
Sentiment analysis Topic modelling Stylometry Hands On |
Network analysis (Gephi) |
Network analysis (Data scrapping) |
Named-entity Recognition |
Mapping |
Mapping |
2nd hour |
Introduction to R and RStudio |
|||||||||
3rd hour |
Sentiment analysis |
Topic modelling |
Stylometry |
Projects |
Network analysis (Gephi) |
Inkscape & Gimp |
Mapping (Coordinates) |
Mapping |
Projects |
|
4th hour |
At the beginning of the course, the workshop leaders will divide the class in two groups according to their research interests. Each group will carry out some research to be presented on the last day of the workshop, using one of the methodologies introduced during the week.
3.0 Technical Requirements
- Participants should have their own computer with at least 5-10GB of available space.
- Operating System: Windows (preferably 7+), Linux or Mac OSX.
- Java 8 for the operating system. You may need to create an Oracle account to download Java 8.
- Zip / unzip programs (these are programs that you normally have by default in your computer, like 7-Zip or WinZip for Windows, to manage compressed folders).
- Browser: Mozilla Firefox and Google Chrome.
- Simple text reading program (for txt and csv) like Sublime Text Editor 3 for Windows, Linux and Mac.
- Google account
- R version 3.5.1 (2018-07-02) -- "Feather Spray"
- RStudio and Xquartz (the latter for Mac)
- Openoffice
- Gephi
- Inkscape
- Gimp
2022
2021
2020
- Important dates
- Schedule
- Workshops
- OCR4all – An Open Source Tool Providing a Full OCR Workflow For Creating Digital Corpus From Printed Sources
- XML-TEI document encoding, structuring, rendering and transformation
- Hands on Humanities Data Workshop - Creation, Discovery and Analysis
- Recording, Transcription and Analysis of Spoken Language Data
- Digital Annotation and Analysis of Literary Texts with CATMA 6
- Corpus Linguistics for Digital Humanities. Introduction to Methods and Tools
- Institutional Communication: Corpora, Analysis, Application
- Neural Networks for Natural Language Processing - An Introduction
- Stylometry
- Distant Reading in R. Analyse the text & visualize the Data
- Image Processing and Machine Learning for the Digital Humanities
- Humanities Data and Mapping Environments
- Manuscripts in the Digital Age: XML-Based Catalogues and Editions
- Digital Archives: Reading and Manipulating Large-Scale Catalogues, Curating and Creating Small-Scale Archives
- Making an edition of a text in many versions
- Lectures (public)
- Panel (public)
- Experts
- Lecturers
- Application
- Scholarships
- Participation fees