From father Busa to Linked Data. What does Thomas Aquinas have to do with the Semantic Web
The Italian Jesuit Roberto Busa is always mentioned among the pioneers of what today is called Computational Linguistics, Digital Humanities and/or Humanities Computing. He is best known for his terrific work on the corpus of Thomas Aquinas' texts, the so called Index Thomisticus, which was built across more than 30 years since the end of the 40s.
While developing the Index Thomisticus, father Busa realized that a methodological turn was about to happen in the field, because of the new challenges that processing linguistic data with computers was opening. A new kind of Humanism, made of replicable results and complete documentation, was about to rise.
For Busa, "automation" does not make life easier, but it “expects much more from the spiritual industriousness of mankind”. Great results will come from a greater effort than ever before. Not only will this make the Humanities more scientific, but it will make them also more “humanistic”, by demanding a new kind of “know thyself”. The steady confrontation with empirical data, required by the need to formalize language for computer processing, shows us our limitations, which are first of all humanistic limitations, if it holds true that we are not even “able to explain how we speak” (Busa, 1958, p. 840; English translation by Philip Barras).
My talk wants to discuss some fundamental ideas of Busa by taking a journey through a number of his writings from the 50s to the 80s (all to be re-published in a forthcoming volume).
Furthermore, I will show some original material taken from the 'Busa Archive', stored at Università Cattolica in Milan. The archive of documents collected by father Busa throughout his entire life includes press articles in the national and international media, correspondence between Busa and his contemporaries in Italy and abroad, material relating to particular phases of the Index Thomisticus, and the Opera Omnia of Busa. This given, the Archive represents an invaluable source of documentation for anyone interested in the history of the discipline.
Finally, the talk will sketch the current state of the art of the Index Thomisticus, by introducing the Index Thomisticus Treebank and its future inclusion into a Linked Data based Knowledge Base of linguistic resources and NLP tools for Latin that is going to be built in the context of a recently funded ERC-CoG project (Grant Agreement No 769994).
References
Busa, Roberto (1958), I principali problemi dell'automazione del linguaggio scritto, in Atti della VI Sessione delle Giornate della Scienza - Convegno Int. sui Problemi dell'Automatismo, Milano, 8 - 13 Aprile 1956, vol. I, Roma, Consiglio Nazionale delle Ricerche.
Passarotti, Marco (2011), Language Resources. The State of the Art of Latin and the Index Thomisticus Treebank Project, in Ortola, Marie-Sol (ed.), Corpus anciens et Bases de données, «ALIENTO. Échanges sapientiels en Méditerranée», N°2, Nancy, Presses universitaires de Nancy, 2011, pp. 301-320.