Jump to Navigation

"Culture & Technology" European Summer University in Digital Humanities
University of Leipzig

Searching Linguistic Patterns in Text Corpora for Digital Humanities Research

Corpora, i.e. collections of linguistic data (texts or conversations), are a fundamental asset of digital humanities research. A ubiquitous task for linguists is to find linguistic patterns in corpora:

  • Which forms of the verb *to be* occur in a given text?
  • Is it common that number words are preceded by articles?
  • What is the average length of a DP/NP in a modern English text, potentially compared to modern Dutch?
  • What types of phrase do occur as direct object in philosophical texts?
  • What words can occur as hesitation markers in spoken modern German?

Depending on the type of processing and annotation (e.g., lemmatization), such questions are more or less difficult to answer as finding the corresponding data and counting them can be difficult or very easy. This course will present fundamental techniques for searching in corpora, viz.

  • searching for single word forms,
  • searching with wild cards or distance operators,
  • regular expressions to search for similar word forms,
  • searching in hierarchical annotation to find syntactic or semantic configurations.

You will learn about different query languages used for searching in corpora.

The course will be mainly concerned with textual corpora, but as searching on speech or multimodal corpora is generally carried out on the transcription and annotation layers, it will also be useful to researchers dealing with such data.

  • Deutsch
  • The Name
  • Background
  • Mission
  • Audience
  • Workshops
  • Lectures
  • Projects
  • Round Tables
  • Working Languages
  • Impressum
  • Kontakt

2022

  • Important dates
  • Application
  • Workshops
  • Experts
  • ConfTool
  • Scholarships etc.
  • Participation fees
  • Moodle
  • Scientific Committee

2021

  • ESU DH C&T 2021
  • Important dates 2021
  • ConfTool
  • Programme
  • Workshops
  • Experts
  • Application
  • Lectures
  • Scholarships
  • Participation fees
  • Moodle
  • Scientific Committee

2020

  • Important dates
  • Schedule
  • Application
  • Workshops
  • Lectures (public)
  • Panel (public)
  • Scholarships
  • Participation fees
  • Experts
  • Lecturers

2019

  • Schedule
  • Birthday thoughts
  • T-Shirts
  • Workshops
    • XML-TEI document encoding, structuring, rendering and transformation
    • Hands on Humanities Data Workshop - Creation, Discovery and Analysis
    • Manuscripts in the Digital Age: XML-Based Catalogues and Editions
    • Digital Annotation and Analysis of Literary Texts with CATMA 6.0
    • Compilation, Annotation and Analysis of Written Text Corpora. Introduction to Methods and Tools
    • Searching Linguistic Patterns in Text Corpora for Digital Humanities Research
    • All About Data – Exploratory Data Modelling and Practical Database Access
    • Stylometrie
    • Humanities Data and Mapping Environments
    • Images of Image Machines. Theory and Practice of Interpretable Machine Learning for the Digitial Humanities
    • An Introduction to Neural Networks for Natural Language Processing - Applications and Implementation
  • Teasers (public)
  • Projects (public)
  • Poster Session (public)
  • Lectures (public)
  • Panel (public)
  • Cultural programme
  • Scientific Committee
  • Experts
  • Lecturers
  • Important dates (new)
  • Application
  • Participation fees
  • Refund policy
  • Scholarships (updated)
  • Child care

2018

  • Important dates
  • Schedule
  • The logo riddle
  • T-Shirt
  • Workshops
  • Teasers (public)
  • Projects (public)
  • Posters (public)
  • Lectures (public)
  • Panel discussion (public)
  • Cultural Programme
  • Experts
  • Lecturers
  • Application
  • Scholarships
  • Fees
  • Scientific Committee
  • Child Care
  • Refund policy

2017

  • Important dates
  • Schedule
  • Workshops
  • Teasers / Specials
  • Lectures (public)
  • Projects (public)
  • Panel (public)
  • Cultural Programme
  • Experts
  • Lecturers
  • ConfTool
  • Fees
  • Refund Policy
  • T-Shirt
  • Child care
  • Flyer
  • Scientific Committee
  • Scholarships
  • Application

2016

  • Important dates
  • Schedule
  • Workshops
  • Teasers (public)
  • Lectures (public)
  • Projects & Posters (public)
  • Panel
  • Slams
  • Experts
  • Lecturers
  • T-Shirt 2016
  • Scientific Committee
  • Application
  • ConfTool
  • Scholarships
  • Fees
  • Refund policy
  • Flyer
  • Child Care

2015

  • Important dates
  • Schedule
  • T-Shirt 2015
  • Workshops
  • Teaser / Special sessions
  • Workshop Slams
  • Lectures
  • Projects
  • Posters
  • Panel
  • Experts
  • Lecturers
  • Child Care
  • Scholarships
  • Fees
  • Application
  • Sponsorship
  • Refund policy
  • Scientific Committee
  • Questions
  • Flyer and Poster

2014

  • Important dates
  • Schedule
  • Child care
  • Workshops
  • Lectures
  • Projects
  • Panel
  • Experts
  • Lecturers
  • Application
  • Fees
  • Questions
  • Scholarships
  • Scientific Committee
  • Flyer

2013

  • Important dates
  • Schedule
  • T-Shirt
  • Workshops
  • Lectures
  • Projects & Posters
  • Panel
  • Experts
  • Lecturers
  • Project Presenters
  • Certificate
  • Sponsorship
  • Bursaries
  • Application
  • Fees
  • Refund Policy
  • Scientific Committee

2012

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Project Presentations
  • Poster Slam & Session
  • Panel Discussions
  • Excursion
  • Lecturers
  • Certificate
  • Scientific Committee
  • Duration & Structure
  • Important Dates
  • Application
  • Registration Fees
  • Bursaries

2010

  • Wichtige Termine
  • Programm
  • Workshops
  • Lehrende
  • Vorlesungen
  • Podiumsdiskussion
  • Bewerbung
  • Teilnahmegebühren
  • Stipendien

2009

  • Programm
  • Workshops
  • Lecturers
  • Projektpräsentationen
  • Lectures
  • Podiumsdiskussion

Leipzig

  • Contact
  • Mailinglist
  • Host
  • Venue
  • Moodle
  • Accommodation (updated)
  • City Map
  • Arrival
  • Events
  • Weather

What the ESU means to me

ESU in the Media

ESU 2019 Experiences (DARIAH-EU)
ESU 2018 Experiences (CLARIN-D)
ESU 2017 (CLARIN-D Blog)
CLARIN-D at ESU 2015 (YouTube)
CLARIN-D ESU 2015 (YouTube)
Mephisto 97.6 10.07.13
Campus Online 10.08.2012
Mephisto 97.6 26.07.2010
infotvleipzig 26.07.2010
In India 03.09.2010

Reviews

INFOtheka: Review of ESU DH 2009
INFOtheka: Review of ESU DH 2012
Infoclio.ch: Review of ESU DH
2013

Publications

Multimodal Analysis of “well”

Users

  • Login

DAAD

 

CLARIN ERIC

 

Sächsische Akademie der Wissenschaften

 

Universität Leipzig

 

BMBF

 

Electronic Textual Cultures Lab at the University of Victoria & Digital Humanities Summer Institute

CLARIN-D

 

DARIAH-EU

 

Slovenian Language Technologies Society (SDJT)

 

Parthenos

International Centre/AAA

 

Computational Humanities

 

Oxygen XML Editor

 

Universitätsbibliothek