Jump to Navigation

"Culture & Technology" European Summer University in Digital Humanities
University of Leipzig

Word Vectors and Corpus Text Mining with Python

How could we examine language and culture through computation? This workshop is an in-depth and hands-on overview on computational text mining and word vectors. It will cover from basic text mining foundations of Natural Language Processing and word vectorization, to the most recent advances at the intersection of computational linguistics and humanist applications. We will guide attendees through the theoretical and conceptual background necessary for understanding commonly used tools in DH such as Word2Vec and LDA Topic Modeling. The group will implement techniques and learn to critically analyze and evaluate results. There will be time to reflect on catering these applications and tools to participants’ specializations. Some programming experience is a prerequisite (not necessarily Python) and we will spend a session reviewing the Python necessary for the workshop and the Jupyter environment. Participants are encouraged to bring their own corpora of investigation for the hands-on activities.

Week 1:

This week will set the foundations for participants to be able to work with large-scale humanist text data, starting with data-wrangling and finishing with concepts and tools in machine learning. By the end of this week, participants will be comfortable with simple techniques such as word counts and building simple n-gram models, as well as discussing basic machine learning concepts for humanist applications.

Week 2: 

This week will focus on a popularized method in recent DH scholarship: word vectors. We will cover the concepts and assumptions behind traditional word vectors, Word2Vec and other neural net-based vectorization techniques, as well as their implications for the study of the humanities, e.g., is all bias bad? What can we learn from it? We’ll also spend time reviewing works in the latest DH literature and critically analyzing their methods. The focus will be on participants’ analytical engagement of these methods to their own data and critical interpretation of these applications on humanist inquiries.

  • Español
  • The Name
  • Background
  • Mission
  • Audience
  • Workshops
  • Lectures
  • Projects
  • Round Tables
  • Working Languages
  • Impressum
  • Kontakt

2022

  • Important dates
  • Application
  • Workshops
  • Experts
  • ConfTool
  • Scholarships etc.
  • Participation fees
  • Moodle
  • Scientific Committee

2021

  • ESU DH C&T 2021
  • Important dates 2021
  • ConfTool
  • Programme
  • Workshops
  • Experts
  • Application
  • Lectures
  • Scholarships
  • Participation fees
  • Moodle
  • Scientific Committee

2020

  • Important dates
  • Schedule
  • Workshops
  • Lectures (public)
  • Panel (public)
  • Experts
  • Lecturers
  • Application
  • Scholarships
  • Participation fees

2019

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Poster Session (public)
  • Panel (public)
  • Teasers (public)
  • Cultural programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates (new)
  • Application
  • Scholarships (updated)
  • Participation fees
  • Refund policy
  • T-Shirts
  • Child care
  • Birthday thoughts

2018

  • Schedule
  • Workshops
    • XML-TEI document encoding, structuring, rendering and transformation
    • Hands on Humanities Data Workshop - Creation, Discovery and Analysis
    • Collocations from a multilingual perspective: theory, tools, and applications
    • Reflected Text Analysis in the Digital Humanities
    • Humanities Data and Mapping Environments
    • Building and analysing multimodal corpora
    • Stylometry
    • Asking questions to data in the humanities: right, correct, efficient (Introducing and comparing XQuery, SQL, SPARQL for data from the humanities)
    • Computer Vision Intervention. How digital methods help to visually understand corpora of art and cultural heritage
    • Integrating Human Science Data using CIDOC-CRM as Formal Ontology: a practical approach
    • The humanities scholar's perspective on rule based machine translation
    • Word Vectors and Corpus Text Mining with Python
    • Text Mining with Canonical Text Services
    • How Research Infrastructures empower eHumanities and eHeritage Research(ers)
    • Introduction to Project Management
  • Lectures (public)
  • Projects (public)
  • Posters (public)
  • Panel discussion (public)
  • Teasers (public)
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • T-Shirt
  • The logo riddle
  • Child Care

2017

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects (public)
  • Panel (public)
  • Teasers / Specials
  • Cultural Programme
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund Policy
  • T-Shirt
  • Flyer
  • Child care

2016

  • Schedule
  • Workshops
  • Lectures (public)
  • Projects & Posters (public)
  • Panel
  • Teasers (public)
  • Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Flyer
  • Child Care

2015

  • Schedule
  • Workshops
  • Lectures
  • Projects
  • Posters
  • Panel
  • Teaser / Special sessions
  • Workshop Slams
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Refund policy
  • Child Care
  • T-Shirt 2015
  • Flyer and Poster
  • Sponsorship
  • Questions

2014

  • Schedule
  • Workshops
  • Lectures
  • Projects
  • Panel
  • Experts
  • Lecturers
  • Scientific Committee
  • Important dates
  • Application
  • Scholarships
  • Fees
  • Child care
  • Flyer
  • Sponsorship

2013

  • Schedule
  • Workshops
  • Lectures
  • Projects & Posters
  • Panel
  • Experts
  • Lecturers
  • Project Presenters
  • Scientific Committee
  • Important dates
  • Application
  • Bursaries
  • Fees
  • Refund Policy
  • T-Shirt
  • Certificate
  • Sponsorship

2012

  • Home
  • Schedule
  • Workshops
  • Lectures
  • Project Presentations
  • Poster Slam & Session
  • Panel Discussions
  • Excursion
  • Lecturers
  • Certificate
  • Scientific Committee
  • Important Dates
  • Duration & Structure
  • Application
  • Registration Fees
  • Bursaries

2010

  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Round table
  • Important dates
  • Application
  • Fees
  • Bursaries

2009

  • Schedule
  • Workshops
  • Instructors
  • Lectures
  • Project presentations
  • Round tabel

Leipzig

  • Contact
  • Mailinglist
  • Host
  • Venue
  • Moodle
  • Accommodation (updated)
  • City Map
  • Arrival
  • Events
  • Weather

What the ESU means to me

ESU in the Media

ESU 2019 Experiences (DARIAH-EU)
ESU 2018 Experiences (CLARIN-D)
ESU 2017 (CLARIN-D Blog)
CLARIN-D at ESU 2015 (YouTube)
CLARIN-D ESU 2015 (YouTube)
Mephisto 97.6 10.07.13
Campus Online 10.08.2012
Mephisto 97.6 26.07.2010
infotvleipzig 26.07.2010
In India 03.09.2010

Reviews

INFOtheka: Review of ESU DH 2009
INFOtheka: Review of ESU DH 2012
Infoclio.ch: Review of ESU DH
2013

Publications

Multimodal Analysis of “well”

Users

  • Login

DAAD

 

CLARIN ERIC

 

Sächsische Akademie der Wissenschaften

 

Universität Leipzig

 

BMBF

 

Electronic Textual Cultures Lab at the University of Victoria & Digital Humanities Summer Institute

CLARIN-D

 

DARIAH-EU

 

Slovenian Language Technologies Society (SDJT)

 

Parthenos

International Centre/AAA

 

Computational Humanities

 

Oxygen XML Editor

 

Universitätsbibliothek