From Print and Manuscript to Electronic Version: Text Digitization and Annotation
In this course we will provide an introduction to current methods for the creation and annotation of linguistic (text) corpora. This will be done to a large extend by hands-on exercises. We primarily focus on historical prints and manuscripts, though more recent texts will also be taken into account. Participants' own material can be discussed as well.
We demonstrate and discuss the steps towards the creation of corpora, focusing on text selection, treatment of metadata, transcription and representation of the material, methods of (linguistic and text structural) annotation, and finally the presentation and provision of corpora. In this context, standards and best practices for data recognition and annotation will be introduced and applied by example, e. g. the guidelines of the Text Encoding Initiative (TEI) and specific formats for linguistic text annotation.
Finally, we will introduce some basic methods of corpus analysis, focusing on what the CLARIN infrastructure and the German Text Archive offer in this field.
2022
2021
2020
2019
2018
2017
- Schedule
- Workshops
- XML-TEI document encoding, structuring, rendering and transformation
- Hands on Humanities Data Workshop - Creation, Discovery and Analysis
- Introduction to programming for the Web
- From Print and Manuscript to Electronic Version: Text Digitization and Annotation
- Text processing for linguists and literary scholars with R
- Spoken Language and Multimodal Corpora
- Stylometry
- The Iconic Turn. Image Driven Digital Art History
- Humanities Data and Mapping Environments
- Working with SQL and graph databases
- Canonical Text Services
- Data Management and legal and ethical issues
- Lectures (public)
- Projects (public)
- Panel (public)
- Teasers / Specials
- Cultural Programme
- Experts
- Lecturers
- Scientific Committee
- Important dates
- Application
- Scholarships
- Fees
- Refund Policy
- T-Shirt
- Flyer
- Child care