Jump to Navigation

"Culture & Technology" European Summer University in Digital Humanities
University of Leipzig

Aspects of NLP in language documentation: the case of the DoBeS Kyanga/Shanga project (Niger-Congo, Eastern-Mande)

The DoBeS projects (Volkswagen Foundation) have (had) a focus on endangered languages that are no longer regularly transmitted in the respective speech communities and face to be extinct within the next years or a decade. The purpose of language documentation is not merely linguistic description but the creation of comprehensive records of mostly yet undocumented languages. The annotated corpora auf audio-visual ethnographic materials function as a multimodal resource for the speech communities and their future generations. Nevertheless, these repositories are as well a resource for linguistic research and in particular for data oriented approaches of linguistic typology, which derive features from annotation and not from grammatical description.

It seems to be obvious that the starting constellation of a documentation project– more exactly: a tight time frame in field research, small to medium size teams of trained consultants and researchers, and the task of annotating a large digital corpus– demands effective corpus tools and semi-automatic methods for annotation and linguistic analysis on different levels. However, current approaches in corpus linguistics are largely based on statistical techniques that already require an existing annotation model and, furthermore, large corpora of annotated data for training. This situation creates for the documentationist the sparse data paradox,which says that a large annotated corpus is needed in order to process large corpora. Furthermore, the resulting models produced by current approaches in corpus linguistics are rarely rule based. Their outcome is hardly readable to the human processor and thus little applicable for linguistic analysis and annotation.

We will present the data processing workflow of our DoBeS research project on the undocumented Kyanga language (Eastern, Niger-Volta branch) and discuss this approach against the backdrop of a more general discussion of how to use NLP methods and general typological knowledge about the “grammar codec” for language documentation in order to produce effectively large audio-visual ethnographic corpora. The presentation will focus mainly on the relation of tokenization and word formation and approaches to unorthodox cross-linguistic POS tagging.

  • Deutsch
  • The Name
  • Background
  • Mission
  • Audience
  • Workshops
  • Lectures
  • Projects
  • Round Tables
  • Working Languages
  • Impressum
  • Kontakt

2022

  • Home
  • Important dates
  • Application
  • Workshops
  • Experts
  • ConfTool
  • Scholarships etc.
  • Participation fees
  • Moodle
  • Scientific Committee

2021

  • Home
  • ESU DH C&T 2021
  • Important dates 2021
  • ConfTool
  • Programme
  • Workshops
  • Experts
  • Application
  • Lectures
  • Scholarships
  • Participation fees
  • Scientific Committee

2020

  • Home
  • Important dates
  • Schedule
  • Workshops
  • Lectures (public)
  • Panel (public)
  • Experts
  • Lecturers
  • Application
  • Scholarships
  • Participation fees

2019

  • Home
  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Poster Session (public)
  • Panel (public)
  • Teasers (public)
  • Cultural programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates (new)
  • Application
  • Scholarships (updated)
  • Participation fees
  • Refund policy
  • T-Shirts
  • Child care
  • Birthday thoughts

2018

  • Home
  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Posters (public)
  • Panel discussion (public)
  • Teasers (public)
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • T-Shirt
  • The logo riddle
  • Child Care

2017

  • Home
  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Panel (public)
  • Teasers / Specials
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund Policy
  • T-Shirt
  • Flyer
  • Child care

2016

  • Home
  • Schedule
  • Workshops
  • Lectures (public)
  • Projects & Posters (public)
  • Panel
  • Teasers (public)
  • Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Flyer
  • Child Care

2015

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Projects
  • Posters
  • Panel
  • Teaser / Special sessions
  • Workshop Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Child Care
  • T-Shirt 2015
  • Flyer and Poster
  • Sponsorship
  • Questions

2014

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Projects
    • Alex Bia Platas
    • Stefan Jänicke
    • Henning Schreiber
    • Jochen Tiepmar
    • Heike Zinsmeister
  • Panel
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Child care
  • Flyer
  • Sponsorship

2013

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Projects & Posters
  • Panel
  • Experts
  • Lecturers
  • Project Presenters
  • Scientific Committee
  • Important dates
  • Application
  • Bursaries
  • Fees
  • Refund Policy
  • T-Shirt
  • Certificate
  • Sponsorship

2012

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Project Presentations
  • Poster Slam & Session
  • Panel Discussions
  • Excursion
  • Lecturers
  • Certificate
  • Scientific Committee
  • Important Dates
  • Duration & Structure
  • Application
  • Registration Fees
  • Bursaries

2010

  • Home
  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Round table
  • Important dates
  • Application
  • Fees
  • Bursaries

2009

  • Home
  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Project presentations
  • Round tabel

Leipzig

  • Contact
  • Mailinglist
  • Host
  • Venue
  • Accommodation (updated)
  • City Map
  • Arrival
  • Weather

Experiences

What the ESU means to me
ESU 2022 (Dariah-EU)
ESU 2021 (Dariah-EU)
ESU 2019 Experiences (DARIAH-EU)
ESU 2018 Experiences (CLARIN-D)

ESU in the Media

ESU DH C&T in Zenodo
ESU 2017 (CLARIN-D Blog)
CLARIN-D at ESU 2015 (YouTube) english
CLARIN-D ESU 2015 (YouTube) deutsch
Mephisto 97.6 10.07.13
Campus Online 10.08.2012
Mephisto 97.6 26.07.2010
infotvleipzig 26.07.2010
In India 03.09.2010

Reviews

ESU 2021 (DiCultHer) How to Move a Summer University in Digital Humanities Online and Keep It Human
INFOtheka: Review of ESU DH 2009
INFOtheka: Review of ESU DH 2012
Infoclio.ch: Review of ESU DH 2013

Publications

Multimodal Analysis of “well”

Users

  • Login

DAAD

 

CLARIN ERIC

 

Sächsische Akademie der Wissenschaften

 

Universität Leipzig

 

BMBF

 

Electronic Textual Cultures Lab at the University of Victoria & Digital Humanities Summer Institute

CLARIN-D

 

DARIAH-EU

 

Slovenian Language Technologies Society (SDJT)

 

Parthenos

International Centre/AAA

 

Computational Humanities

 

Oxygen XML Editor

 

Universitätsbibliothek