Jump to Navigation

"Culture & Technology" European Summer University in Digital Humanities
University of Leipzig

Digital Archives: Reading and Manipulating Large-Scale Catalogues, Curating and Creating Small-Scale Archives

The purpose of this two-weeks workshop is to develop practical and critical skills toward representation of knowledge in digital archives and to build a small-scaled digital archive.

In the first week, we learn how to work with catalogues, performing ‘distant reading’ of our data working with OpenRefine. Using OpenRefine, students will gain critical insights on how to read, inspect the content and structure, clean and enrich digital catalogues. We will learn different ways of organizing the archive and enriching the data using authority files and linked open data, such as data from library of congress, VIAF and more.

In the second week, building on the practical experience and critical insights acquired in the first week, students will design and implement a digital archive for their collection of documents and will design its metadata. We will work with Omeka and Tropy and get into deeper details, experimenting with various forms of data representation.

The content of the workshop is both theoretical(concepts of archives, authority files, ontologies) as well as practical(hands-on work with specific tools).

No prior knowledge is needed or assumed.

First week - Reading and working with data / collections in OpenRefine

Digital data in various formats stand today in the heart of the humanist’s work: sometimes the files are very large, sometimes messy, sometimes structured, and the scholar wonders: what is there in my data? How is it organized? Can I improve and enrich it? What can be understood from the data about the world, the research domain, about its creators?

OpenRefine was first developed by Google and then given as an open code to the community. It is an open tool for data processing and a skill which can be most valuable to anyone working with data. Purposed to do ‘data wrangling’, i.e., data cleaning, organizing, unifying and more, it allows one to get familiar with the content of her data: research results in
spreadsheet tables, library or museum catalogues in MARC or XML, family trees in GEDCOM format, JSON files, tweets and more text files.

Learning to work in OpenRefine acquaintances you with notions and tools that are useful in many aspects in DH. The workshop, therefore, goes beyond the mere tool (which is powerful by itself and deserves attention) - it will give the students a variety of skills, including utilizing API’s, harnessing the power of linked open data (LOD), scraping web pages, understanding clustering, and developing some programming notions:

  1. Different file types (CSV, TSV, Spreadsheets, JSON, XML TEI)
  2. Working effectively with regular expressions
  3. Writing expressions with GREL (the programming language)
  4. Working with API (geonames)
  5. Working with LOD (wikidata, Kima)

At the end of this course, the participants will be familiar with OpenRefine and with some of its advanced possibilities and experiment with various workflows such as scraping data from the web and organizing it and creating a map from a text.

Class 1: Introduction, loading a file and faceting
Class 2: Working on dates - Regular expressions
Class 3: Clustering
Class 4: Enriching - Fetching data using REST API (working with GeoNames)
HandsOn Session (working on your data, practicing administrative tasks such as changing working directory or memory size)
Class 5: Reconciliation: enriching with Wikidata
Class 6: Working with Jsons and XMLs
Class 7: Web Scraping
Class 8: From text to map, using openrefine
Class 9: Summary

Second week - Building a Digital Archive

Many researchers, libraries or archives own relatively small collections of documents that need digitization and structuring in order to be investigated, represented, searched. In many cases, documents as part of a larger narrative that needs to be told and presented.

But how to do that? Is there a necessity to employ a software engineer to implement this for you? And if so, how to communicate the subtle needs of your project, the nuances, the best practice that you acquire as a digital humanist?

In this workshop, we’ll move from the large-scale catalogues, and deal with the details. Structuring a digital archive is a series of choices, and digitization forces us to re-think of traditional methods of curating, presenting and representing archives.

We’ll begin with an overview of your collections and review examples of archives. We’ll discuss possible metadata schemes and the various considerations in the choice process - thinking of an archive as a body of knowledge that exists on its own. Then we’ll use and understand ‘best practice’ tools for the implementation.

Class 1: Theory of archives - An introduction
Class 2: Digital archives - Examples, your collections
Class 3: Working with primary resources, “Scanning party?”
Class 4: Metadata - Methods of description, issues, dilemas
Class 5: Omeka - Introduction, creating an account
Class 6: Tropy - Basic features, interface with Omeka
Hands On Session - Working on your collection
Class 7: Linking and integration with external resources
Class 8: Publishing - Design Omeka pages
Class 9: Summary

  • עברית
  • The Name
  • Background
  • Mission
  • Audience
  • Workshops
  • Lectures
  • Projects
  • Round Tables
  • Working Languages
  • Impressum
  • Kontakt

2022

  • Important dates
  • Application
  • Workshops
  • Experts
  • ConfTool
  • Scholarships etc.
  • Participation fees
  • Moodle
  • Scientific Committee

2021

  • ESU DH C&T 2021
  • Important dates 2021
  • ConfTool
  • Programme
  • Workshops
  • Experts
  • Application
  • Lectures
  • Scholarships
  • Participation fees
  • Moodle
  • Scientific Committee

2020

  • Important dates
  • Schedule
  • Workshops
    • OCR4all – An Open Source Tool Providing a Full OCR Workflow For Creating Digital Corpus From Printed Sources
    • XML-TEI document encoding, structuring, rendering and transformation
    • Hands on Humanities Data Workshop - Creation, Discovery and Analysis
    • Recording, Transcription and Analysis of Spoken Language Data
    • Digital Annotation and Analysis of Literary Texts with CATMA 6
    • Corpus Linguistics for Digital Humanities. Introduction to Methods and Tools
    • Institutional Communication: Corpora, Analysis, Application
    • Neural Networks for Natural Language Processing - An Introduction
    • Stylometry
    • Distant Reading in R. Analyse the text & visualize the Data
    • Image Processing and Machine Learning for the Digital Humanities
    • Humanities Data and Mapping Environments
    • Manuscripts in the Digital Age: XML-Based Catalogues and Editions
    • Digital Archives: Reading and Manipulating Large-Scale Catalogues, Curating and Creating Small-Scale Archives
    • Making an edition of a text in many versions
  • Lectures (public)
  • Panel (public)
  • Experts
  • Lecturers
  • Application
  • Scholarships
  • Participation fees

2019

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Poster Session (public)
  • Panel (public)
  • Teasers (public)
  • Cultural programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates (new)
  • Application
  • Scholarships (updated)
  • Participation fees
  • Refund policy
  • T-Shirts
  • Child care
  • Birthday thoughts

2018

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Posters (public)
  • Panel discussion (public)
  • Teasers (public)
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • T-Shirt
  • The logo riddle
  • Child Care

2017

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Panel (public)
  • Teasers / Specials
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund Policy
  • T-Shirt
  • Flyer
  • Child care

2016

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects & Posters (public)
  • Panel
  • Teasers (public)
  • Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Flyer
  • Child Care

2015

  • Schedule
  • Workshops
  • Lectures
  • Projects
  • Posters
  • Panel
  • Teaser / Special sessions
  • Workshop Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Child Care
  • T-Shirt 2015
  • Flyer and Poster
  • Sponsorship
  • Questions

2014

  • Schedule
  • Workshops
  • Lectures
  • Projects
  • Panel
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Child care
  • Flyer
  • Sponsorship

2013

  • Schedule
  • Workshops
  • Lectures
  • Projects & Posters
  • Panel
  • Experts
  • Lecturers
  • Project Presenters
  • Scientific Committee
  • Important dates
  • Application
  • Bursaries
  • Fees
  • Refund Policy
  • T-Shirt
  • Certificate
  • Sponsorship

2012

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Project Presentations
  • Poster Slam & Session
  • Panel Discussions
  • Excursion
  • Lecturers
  • Certificate
  • Scientific Committee
  • Important Dates
  • Duration & Structure
  • Application
  • Registration Fees
  • Bursaries

2010

  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Round table
  • Important dates
  • Application
  • Fees
  • Bursaries

2009

  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Project presentations
  • Round tabel

Leipzig

  • Contact
  • Mailinglist
  • Host
  • Venue
  • Moodle
  • Accommodation (updated)
  • City Map
  • Arrival
  • Events
  • Weather

What the ESU means to me

ESU in the Media

ESU 2019 Experiences (DARIAH-EU)
ESU 2018 Experiences (CLARIN-D)
ESU 2017 (CLARIN-D Blog)
CLARIN-D at ESU 2015 (YouTube)
CLARIN-D ESU 2015 (YouTube)
Mephisto 97.6 10.07.13
Campus Online 10.08.2012
Mephisto 97.6 26.07.2010
infotvleipzig 26.07.2010
In India 03.09.2010

Reviews

INFOtheka: Review of ESU DH 2009
INFOtheka: Review of ESU DH 2012
Infoclio.ch: Review of ESU DH
2013

Publications

Multimodal Analysis of “well”

Users

  • Login

DAAD

 

CLARIN ERIC

 

Sächsische Akademie der Wissenschaften

 

Universität Leipzig

 

BMBF

 

Electronic Textual Cultures Lab at the University of Victoria & Digital Humanities Summer Institute

CLARIN-D

 

DARIAH-EU

 

Slovenian Language Technologies Society (SDJT)

 

Parthenos

International Centre/AAA

 

Computational Humanities

 

Oxygen XML Editor

 

Universitätsbibliothek