Jump to Navigation

"Culture & Technology" European Summer University in Digital Humanities
University of Leipzig

Digital Annotation and Analysis of Literary Texts with CATMA 6

Aims of the Workshop

The Workshop introduces students of literature to CATMA 6 (Computer Assisted Text Markup and Analysis; https://catma.de), an open source tool developed at and hosted by the University of Hamburg since 2008. CATMA is currently used by over 60 research projects and approx. 10.000 users worldwide. Its new sixth version forms part of the DFG-funded forTEXT project (https://fortext.net) and offers a unique combination of three main features:

  1. CATMA supports collaborative annotation and analysis – a text or text corpus can be investigated individually, but also jointly by agroup of students or researchers.
  2. CATMA supports explorative, non-deterministic practices oftext annotation – a discursive, debate-oriented approach to text annotationbased on the research practices of hermeneutic disciplines is the underlyingconceptual model.
  3. CATMA integrates text annotation and textanalysis in a web-based working environment – which makes it possible tocombine the identification of textual phenomena with their investigation in aseamless, iterative fashion.

What sets CATMA apart from other digital annotation methods is its ‘undogmatic’ approach: the system does neither prescribe defined annotation schemata or rules, nor does it force the user to apply rigid yes / no, right / wrong taxonomies to texts (even though it allows for more prescriptive schemata as well). In other words, CATMA’s logic invites users to explore the richness and multifacetedness of textual phenomena according to their needs: users can create, expand, and continuously modify their own individual tagsets – so if a text passage invites more than one interpretation, nothing in the system prevents assigning multiple, or even contradictory annotations.

Despite its flexibility, CATMA does not produce idiosyncratic annotations: all markup data can be exported in TEI/XML-format and reused in other contexts.

Since CATMA is a highly intuitive tool it is particularly suitable for humanists with little technical knowledge: the graphical user interface allows for a quick kick-off, and CATMA’s build query function (a step-by-step dialogue-based widget) helps users retrieve complex information from texts without having to learn a query language. Moreover, CATMA’s easy-to-use automated distant-reading functions are continuously enhanced and extended.

In our workshop we will introduce the core annotation and analysis functionalities of CATMA and show how they can be combined with the annotations provided automatically. In week 1, participants will be taken in a step-by-step, hands-on approach through the full cycle of a CATMA-based text investigation and can work on their own texts / projects:

  1. From text upload to initial text investigations,
  2. then to annotation and specification of annotation categories,
  3. from there to combined text queries that consult the source text and its annotations in combination,
  4. and finally to the visual output of query results.

Participants wil lbe able to test and apply the tool hands-on: they will annotate their own texts, create their own Tagsets, and define their tags in an annotation guideline. We would also like to engage participants in a critique of CATMA’s design and components as well as a general discussion about requirements for text analysis tools in their fields of interest.

In week 2 we will combine the work in CATMA with other methods and tools for digital text analysis like NER and (S)NA in two steps. We will begin with the visual investigation and refinement of annotations created in week 1. Second, we will focus on the application of CATMA to the individual projects of the workshop participants: what is the outcome of the CATMA based annotation and analysis of texts as well as of the creation of genuine Tagsets relevant to these projects? Each participant will give a short presentation on their project, followed by a group discussion.

Target Audience of the Workshop

The primary users of CATMA are literary scholars, as well as graduate and undergraduate students in Literary Studies. In addition, this workshop is likely to be of interest to

  1. humanities scholars in all fields concerned with text analysis (with and without experience in digital text analysis),
  2. software developers in the humanities interested in non-deterministic text analysis and automated annotation.

Participants need no prior knowledge of digital text annotation and can work with their own laptop computers and their own digital texts.CATMA runs on Laptop or PC (Windows, Unix or MacOS) with a current web browser (Edge, Firefox, Chrome, Safari) with a mouse or touchpad. Touchscreen navigation is not yet supported.

Schedule

Day

Minute 1–45

Minute 45–90

Minute 90–135

Minute 135–180

1

CATMA Concept

CATMA Demo

Project Presentations 1

2

Project Presentations 2

Annotation

3

Tagset Creation

Tagset Creation

Guidelines

Tag Definitions

4

Analyze

Analyze Text & Annotations

Visualize

Visualize Text and Annotations

5

Corpus

Analyze Corpus

Automatization

Synthesis

Week1 CATMA

Day1

  1. CATMA Concept
    1. undogmatic; hermeneutic; for literary studies
    2. distant, close and scalable reading
  2. CATMA Demo
    1. Introduction of CATMA’s architecture and exemplary workflow
    2. general functions
  3. Project presentations of participants 1

Day2

  1. Project presentations of participants 2
  2. Annotate your own text

Day3

  1. How to create a Tagset and presentation of existing Tagsets
  2. Create your own Tagset
  3. The use of annotation guidelines for collaborative annotation -> interannotator agreement and interannotator disagreement
  4. How could the tags in your project be defined?

Day4

  1. Demonstration of the Analyze module and its functions
  2. Analyze your text and the annotations you created
  3. Demonstration of the Visualize module and its functions
  4. Visualize your text and the annotations you created

Day5

  1. Corpus and Corpus functions in CATMA
  2. Analyze and visualize your own corpus
  3. Automatization functions in CATMA
  4. What is needed to automatize the annotation in your project?

Week 2 CATMA plus

Day

Minute 1–45

Minute 45–90

Minute 90–135

Minute 135–180

1

3DH prototype

Analyze visually

3DH postulates

Diverse Visualizations

2

NER

Stanford NER

Import/export

NER & CATMA

2&3

Refine annotations

Network Analysis

Network Analysis

4

Preparation of presentation in small method related groups

presentation of workshop results and individual feedback

5

Backlog

Synthesis

Feedback

Wishlist

Day 1

  1. How to explore and refine annotations with the 3DH prototype
  2. Analyze your annotations visually in 3DH
  3. What a digital visualization environment should be able to do – the four 3DH postulates (2 way screen, parallax, qualitative, and discursive)
  4. Try out diverse interactive visualizations in 3DH

Day 2 (3 x 1 ½ hours)

  1. Named Entity Recognition as method
  2. Use the Stanford Named Entity Recognizer with your own text
  3. How to export data from Stanford NER and import them into CATMA
  4. Evaluate and Refine the automatically generated NER annotations in CATMA and train your NER model
  5. How can the 3DH visualizations and NER annotations help to refine the annotations in your project?

Day 3 (1 ½ hours)

  1. How (social) network analysis can be a postprocessing of CATMA generated data
  2. Create your own semantic networks in Gephi

Day 4

  1. Preparation of method related presentations of results in small groups
  2. Presentations: (How) did the workshop help your research project?
  3. individual feedback

Day 5

  1. Backlog for further questions and details
  2. How do the different digital methods relate to one another and why is it fruitful to combine them? What are the challenges for digital tools with regards to literary texts?
  3. Feedback and evaluation of the workshop
  4. Wishlist for further methods or tools to discuss (or to develop)
  • Deutsch
  • The Name
  • Background
  • Mission
  • Audience
  • Workshops
  • Lectures
  • Projects
  • Round Tables
  • Working Languages
  • Impressum
  • Kontakt

2022

  • Important dates
  • Application
  • Workshops
  • Experts
  • ConfTool
  • Scholarships etc.
  • Participation fees
  • Moodle
  • Scientific Committee

2021

  • ESU DH C&T 2021
  • Important dates 2021
  • ConfTool
  • Programme
  • Workshops
  • Experts
  • Application
  • Lectures
  • Scholarships
  • Participation fees
  • Moodle
  • Scientific Committee

2020

  • Important dates
  • Schedule
  • Workshops
    • OCR4all – An Open Source Tool Providing a Full OCR Workflow For Creating Digital Corpus From Printed Sources
    • XML-TEI document encoding, structuring, rendering and transformation
    • Hands on Humanities Data Workshop - Creation, Discovery and Analysis
    • Recording, Transcription and Analysis of Spoken Language Data
    • Digital Annotation and Analysis of Literary Texts with CATMA 6
    • Corpus Linguistics for Digital Humanities. Introduction to Methods and Tools
    • Institutional Communication: Corpora, Analysis, Application
    • Neural Networks for Natural Language Processing - An Introduction
    • Stylometry
    • Distant Reading in R. Analyse the text & visualize the Data
    • Image Processing and Machine Learning for the Digital Humanities
    • Humanities Data and Mapping Environments
    • Manuscripts in the Digital Age: XML-Based Catalogues and Editions
    • Digital Archives: Reading and Manipulating Large-Scale Catalogues, Curating and Creating Small-Scale Archives
    • Making an edition of a text in many versions
  • Lectures (public)
  • Panel (public)
  • Experts
  • Lecturers
  • Application
  • Scholarships
  • Participation fees

2019

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Poster Session (public)
  • Panel (public)
  • Teasers (public)
  • Cultural programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates (new)
  • Application
  • Scholarships (updated)
  • Participation fees
  • Refund policy
  • T-Shirts
  • Child care
  • Birthday thoughts

2018

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Posters (public)
  • Panel discussion (public)
  • Teasers (public)
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • T-Shirt
  • The logo riddle
  • Child Care

2017

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Panel (public)
  • Teasers / Specials
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund Policy
  • T-Shirt
  • Flyer
  • Child care

2016

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects & Posters (public)
  • Panel
  • Teasers (public)
  • Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Flyer
  • Child Care

2015

  • Schedule
  • Workshops
  • Lectures
  • Projects
  • Posters
  • Panel
  • Teaser / Special sessions
  • Workshop Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Child Care
  • T-Shirt 2015
  • Flyer and Poster
  • Sponsorship
  • Questions

2014

  • Schedule
  • Workshops
  • Lectures
  • Projects
  • Panel
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Child care
  • Flyer
  • Sponsorship

2013

  • Schedule
  • Workshops
  • Lectures
  • Projects & Posters
  • Panel
  • Experts
  • Lecturers
  • Project Presenters
  • Scientific Committee
  • Important dates
  • Application
  • Bursaries
  • Fees
  • Refund Policy
  • T-Shirt
  • Certificate
  • Sponsorship

2012

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Project Presentations
  • Poster Slam & Session
  • Panel Discussions
  • Excursion
  • Lecturers
  • Certificate
  • Scientific Committee
  • Important Dates
  • Duration & Structure
  • Application
  • Registration Fees
  • Bursaries

2010

  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Round table
  • Important dates
  • Application
  • Fees
  • Bursaries

2009

  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Project presentations
  • Round tabel

Leipzig

  • Contact
  • Mailinglist
  • Host
  • Venue
  • Moodle
  • Accommodation (updated)
  • City Map
  • Arrival
  • Events
  • Weather

What the ESU means to me

ESU in the Media

ESU 2019 Experiences (DARIAH-EU)
ESU 2018 Experiences (CLARIN-D)
ESU 2017 (CLARIN-D Blog)
CLARIN-D at ESU 2015 (YouTube)
CLARIN-D ESU 2015 (YouTube)
Mephisto 97.6 10.07.13
Campus Online 10.08.2012
Mephisto 97.6 26.07.2010
infotvleipzig 26.07.2010
In India 03.09.2010

Reviews

INFOtheka: Review of ESU DH 2009
INFOtheka: Review of ESU DH 2012
Infoclio.ch: Review of ESU DH
2013

Publications

Multimodal Analysis of “well”

Users

  • Login

DAAD

 

CLARIN ERIC

 

Sächsische Akademie der Wissenschaften

 

Universität Leipzig

 

BMBF

 

Electronic Textual Cultures Lab at the University of Victoria & Digital Humanities Summer Institute

CLARIN-D

 

DARIAH-EU

 

Slovenian Language Technologies Society (SDJT)

 

Parthenos

International Centre/AAA

 

Computational Humanities

 

Oxygen XML Editor

 

Universitätsbibliothek