The humanities scholar's perspective on rule based machine translation
Rule-based machine translation is an interesting application for natural language processing as well as digital humanities, for the reason that it spans over so many of the topics and concepts of NLP and DH. It can therefore be included in so many DH and NLP work-flows, producing necessary resources, such as dictionaries, digital grammars, as a side-product of research work in digital texts and corpora.
During this course we will create a simple rule-based machine translation system (based on Apertium) that is capable of translating one short text from one language to another. We will learn to write necessary dictionaries in an XML-based format, use version control software, and participate in open source development community.
Tentative schedules
Week one
- Intro to Machine Translation
- Installing the platform and tools
- Working with the tools
- XML basics
- Digital lexicography and morphology
- Parsing
- Word-based translations
Week two
- Phrase parsing / chunking
- Re-ordering and grammatical changes
- Evaluation and Quality Assurance
- Comparative grammars
- Conneting to large coverage dictionaries
- other systems (if there's time)
2022
2021
2020
2019
2018
- Schedule
- Workshops
- XML-TEI document encoding, structuring, rendering and transformation
- Hands on Humanities Data Workshop - Creation, Discovery and Analysis
- Collocations from a multilingual perspective: theory, tools, and applications
- Reflected Text Analysis in the Digital Humanities
- Humanities Data and Mapping Environments
- Building and analysing multimodal corpora
- Stylometry
- Asking questions to data in the humanities: right, correct, efficient (Introducing and comparing XQuery, SQL, SPARQL for data from the humanities)
- Computer Vision Intervention. How digital methods help to visually understand corpora of art and cultural heritage
- Integrating Human Science Data using CIDOC-CRM as Formal Ontology: a practical approach
- The humanities scholar's perspective on rule based machine translation
- Word Vectors and Corpus Text Mining with Python
- Text Mining with Canonical Text Services
- How Research Infrastructures empower eHumanities and eHeritage Research(ers)
- Introduction to Project Management
- Lectures (public)
- Projects (public)
- Posters (public)
- Panel discussion (public)
- Teasers (public)
- Cultural Programme
- Experts
- Lecturers
- Scientific Committee
- Important dates
- Application
- Scholarships
- Fees
- Refund policy
- T-Shirt
- The logo riddle
- Child Care