Image Processing and Machine Learning for the Digital Humanities
While text is still the most important research topic in the digital humanities, over the past ten years images have started to gradually appear on the radar of computational humanists. Recent developments in digital art history in particular have shown that the importance of images for DH research goes beyond ensuring their accessibility through databases and interfaces. In fact, images are where digital humanities and “artificial intelligence” meet. Most importantly, the automated classification of images on the one hand, and the automated production of images on the other raise fundamental questions at the interface of computer science and the humanities: how is reality represented in machine learning systems, and how and why do human-held preconceptions, biases, and misjudgements enter such systems?
In the workshop, we will tackle these important questions from two directions.
The first week of the workshop will serve as an introduction to image processing for the digital humanities. Starting from scratch (what is a digital image?) we will gradually explore image processing strategies, i.e. theoretical approaches and practical implementations, that are useful for DH applications, including: scraping (building large-scale image datasets from Web sources), batch processing ("cleaning" image datasets and adapting them to the affordances of a machine), feature extraction (extracting semantic information from image datasets), clustering (visually sorting and reviewing image datasets), and classification (analyzing image datasets using pre-trained machine learning systems). Participants are encouraged to try some of these strategies on a provided practice dataset but eventually work on their on image corpora or image corpora ideas from all areas of visual studies, including but not limited to cultural heritage, historical image data, museum and archival collections, artworks, etc..
The second week of the workshop will be dedicated to a historical and philosophical critique of image processing strategies and machine learning applications, particularly as DH tools. We will read and discuss recent developments like facial recognition and research results investigating these developments from areas such as FAT-ML (fairness, accountability, and transparency of machine learning), digital art history, media studies, and science and technology studies.
The workshop requires a willingness to pick up basic concepts of the Python programming language during the workshop. Previous computer programming experience is beneficial.
2022
2021
2020
- Important dates
- Schedule
- Workshops
- OCR4all – An Open Source Tool Providing a Full OCR Workflow For Creating Digital Corpus From Printed Sources
- XML-TEI document encoding, structuring, rendering and transformation
- Hands on Humanities Data Workshop - Creation, Discovery and Analysis
- Recording, Transcription and Analysis of Spoken Language Data
- Digital Annotation and Analysis of Literary Texts with CATMA 6
- Corpus Linguistics for Digital Humanities. Introduction to Methods and Tools
- Institutional Communication: Corpora, Analysis, Application
- Neural Networks for Natural Language Processing - An Introduction
- Stylometry
- Distant Reading in R. Analyse the text & visualize the Data
- Image Processing and Machine Learning for the Digital Humanities
- Humanities Data and Mapping Environments
- Manuscripts in the Digital Age: XML-Based Catalogues and Editions
- Digital Archives: Reading and Manipulating Large-Scale Catalogues, Curating and Creating Small-Scale Archives
- Making an edition of a text in many versions
- Lectures (public)
- Panel (public)
- Experts
- Lecturers
- Application
- Scholarships
- Participation fees