The TEI, Its Foundations and Impact, and How It Fits Into Today's Needs and Practices for Language Analysis
The Text Encoding Initiative, started in 1987, is the first major effort to develop a consistent set of guidelines for annotation of language data. At the time the project was initiated, the founders established a set of principles to guide the TEI’s development that still apply to annotation scheme design thirty years later.
This presentation will describe the origins and motivation for the TEI and the principles upon which it was built, and consider how it has impacted the annotation of not only humanities data, but also detailed linguistic phenomena that support machine learning in the area of natural language processing. I will also discuss the role of annotation in this era of deep learning and “big data”, which has led many to consider that manual and/or semi-manual annotation efforts are no longer necessary, or at least far less needed for complex language analysis. Finally, I will outline my thoughts on how the TEI and annotation projects in general fit into the overall landscape of current methodologies for language and linguistic analysis and suggest some directions for the future.