Asking questions to data in the humanities: right, correct, efficient (Introducing and comparing XQuery, SQL, SPARQL for data from the humanities)
The amount of data in the digital humanities and its complexity is growing continuously. Modern database storage and access technologies are needed to handle this data. This course will give an introduction to three relevant technologies: relational databases and SQL, XQuery for XML-formatted data, and graph databases for highly-interconnected data.
Relational databases organize their data in simple tables. SQL is the standard query language to search for and extract data from the database. The technology is mature, there are many excellent database systems available, most programming languages and application programs provide easy access to relational databases.
XML is a language to describe the structure of documents as a hierarchy. XQuery is the standard language to query XML single documents or document collections to search for and extract content from these documents. XML is used for large corpora of text, it is supported by many programming languages, there are numerous application programs and editors for XML data, and it is often used in web-based environments.
Graph databases are a relatively new development. Here, data is seen as information nodes, and the nodes are linked via named arcs. Graphs are highly dynamic and thus well suited in the exploration phase when working with corpora.
For the introduction to these technologies, we will bring sample data, which also allows us to compare the different query methods by using the same underlying information in all paradigms. In course projects, participants may work with sample data provided by us or their own data. By looking at the data, we will discuss ways of asking questions to the data and then try to express them in the query language(s)
Participants should have some basic knowledge of XML. In the course students will have access to the database system sqLite and the XML editor Oxygen.
In the first week we will work on the basics of SQL and XQuery, introducing the basic concepts and syntax of the query languages SQL (for relational databases) and XQuery (for XML data). The key concepts here will be SQL's FROM-SELECT-WHERE, JOIN, aggregate functions and XQuery's FLWOR, functions, output formatting.
In the second week of this workshop we will look at advanced constructions in SQL and XQuery, apply both to the same data sets and compare them to each other. Additionally we will look at graph databases. Following this introduction we will assess how to find questions based on the data, select the appropriate formalism and express the question in the query language. We will also look at applying XQuery to query TEI documents. Key concepts of this week will be inserting, updating, deleting data, SQL's stored procedures and XQuery's user defined functions, graph databases, SPARQL, application of query languages to participant's own research questions.
2022
2021
2020
2019
2018
- Schedule
- Workshops
- XML-TEI document encoding, structuring, rendering and transformation
- Hands on Humanities Data Workshop - Creation, Discovery and Analysis
- Collocations from a multilingual perspective: theory, tools, and applications
- Reflected Text Analysis in the Digital Humanities
- Humanities Data and Mapping Environments
- Building and analysing multimodal corpora
- Stylometry
- Asking questions to data in the humanities: right, correct, efficient (Introducing and comparing XQuery, SQL, SPARQL for data from the humanities)
- Computer Vision Intervention. How digital methods help to visually understand corpora of art and cultural heritage
- Integrating Human Science Data using CIDOC-CRM as Formal Ontology: a practical approach
- The humanities scholar's perspective on rule based machine translation
- Word Vectors and Corpus Text Mining with Python
- Text Mining with Canonical Text Services
- How Research Infrastructures empower eHumanities and eHeritage Research(ers)
- Introduction to Project Management
- Lectures (public)
- Projects (public)
- Posters (public)
- Panel discussion (public)
- Teasers (public)
- Cultural Programme
- Experts
- Lecturers
- Scientific Committee
- Important dates
- Application
- Scholarships
- Fees
- Refund policy
- T-Shirt
- The logo riddle
- Child Care