Canonical Text Services
CTS (Canonical Text Services) is a protocol that makes it possible to cite digitally available texts in a canonical way. This citation is formalized via URNs like urn:cts:demo:shakespeare.sonnets:35.1, which specifies Shakespeares Sonnet 35 line 1. Other examples of URNs are:
urn:cts:demo:shakespeare.hamlet:1.2
urn:cts:demo:goethe.faust.de.ed1:35.1.4
urn:cts:demo:goethe.faust.en.ed4:35.1.4
URNs make it possible to share digital text passages with researchers around the world in a way that encompasses worrying about translations, editions, existance of physical copies or similiar problems.
There are 3 implementations of CTS which use different datastorage-techniques : graph based, xml based and sql based.
I present the MySQL-based implementation, which was implemented for the ESF project Billion Words.
The goal for this implementation - being capable of handling a CTS with a text collection with 1 billion words - was (over-) achieved. There is only a minimal impact on the memory use of the server and it is possible to run multiple instances, which allow different configurations or even user-specific CTS-instances or views.
Different views, for example, allow one CTS for teachers and a different CTS for students, which might only contain a small part of the data for tutorial use.
Another benefit of multiple instances is the possibility to provide additional meta information, which is not part of the CTS-specifications (e.g. dynamically calculated POS-tags). This flexibility enables the use of NLP-processing while still providing the CTS-functionalities without the need to change the data itself while still being valid against the specifications.
This presentation will begin with an introduction in the CTS-standard. and some examples to show what CTS actually is. These examples will including the use of URNs directly as a cite in a PDF-document.
Then, after a short discussion about of the basic idea behind the sql based implementation, there will be a comparism between the properties of the different implementations. This comparism will be touching on technical details - like implications of the way data is stored - and also show differences which can directly change the reply that the user receives.
To show, that, under the right circumstances, CTS is capable of more than what it was specified for, this presentation will end with some example use cases, which are not being covered by the specifications of CTS. These can be implied from its structural approach to text or easily added using the flexibility of this implementation and the freedom to structure passages in the way you choose.