FLaReNet Strategic Language Resource Agenda – Recommendations and Future Prospects
Language Technologies (LT), together with their backbone Language Resources (LR), provide an essential support to the challenge of Multilingualism and Information and Communication Technology of the future. LTs are critical to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin.
LT is a data intensive field. To develop LT, research and industrial teams need access to (i) a volume of data commensurate with the operational conditions of the application and (ii) equally appropriate LT evaluation mechanisms to measure the performance of the best systems available and compare it to the performances required by the application.
The LR field is very active, but it needs coherence. Of course, coherence is better assured by sharing common priorities and endeavours. FLaReNet – Fostering Language Resources Network (http://www.flarenet.eu/)– is an international Forum, composed by a steadily enlarging community, aiming at developing the needed common vision and fostering a European strategy for consolidating the sector, thus enhancing competitiveness at EU level and worldwide. Its final results, described in the final deliverable “Language Resources for the Future – The Future of Language Resources. The Strategic Language Resource Agenda” , constitute a preliminary plan for actions and infrastructures that could become the basis for future initiatives in the field.
Recognising that the development of the sector of LTs is conditioned by various factors, the FLaReNet recommendations cover a broad range of topics and activities, spanning over production and use of LRs, licensing, maintenance, sustainability and preservation issues, infrastructures for LRs, resource identification and sharing, evaluation and validation, interoperability and policy issues. In the Blueprint, the various actions recommended are organised around nine dimensions that are relevant for the field of Language Resources: a) Infrastructure, b) Documentation, c) Development, d) Interoperability, e) Coverage, Quality and Adequacy, f) Availability, Sharing and Distribution, g) Sustainability, h) Recognition and i) International cooperation. Some of these dimensions are of a more infrastructural nature, some are more related to research and development, some yet more to political and strategic aspects, but they all must be seriously considered when making up a strategy for the future of the field. Taken together, these directions are intended to contribute to the creation of a sustainable LRT ecosystem.
I will present some of the high-level recommendations collected in the final Blueprint of Actions and Infrastructures that provides the fundamental recommended actions for the development and progress of LRs and LTs, in particular with a view of those that are more of relevance for the future of Digital Humanities.
A larger and larger range of LRs and LTs is being developed, but the infrastructure that puts LRs and LTs together and sustains them is still largely missing. Infrastructure building is the most urgent issue and also a way to make the field move forward, together with real sharing of resources, and an effort to make resources available for all languages. In this context, interoperability of resources, tools, and frameworks has recently come to be understood as perhaps the most pressing current need for language processing research.
I will also present new open instruments, such as the LRE Map and the Language Library, that collect collaboratively built information on Language Resources, with the involvement of all the community. The LRE Map (already well-known and consulted every day) gathers metadata on all language resources described in conference papers (many conferences have joined the initiative), while the Language Library gathers the results of processing/annotating parallel/comparable texts in many languages. Interoperability, an essential aspect to be pursued in a data-intensive discipline, is at the core of these initiatives that aim also at enhancing harmonisation of metadata and facilitating integration of independently processed data. An important foreseen next step would be to connect all these language resources metadata and processed data to Linked Data.
Technical scientific issues are obviously important, but organisational, coordination, political issues play a major role in our field as in every other. Technologies exist and develop fast, but the infrastructure that puts them together and sustains them must be properly designed, created and sustained.
Since its very beginning, FLaReNet has recognised the need of overcoming the European dimension for a truly incisive footprint to be left. A global dimension is to be taken into consideration when planning and designing the future of Language Resources and Technologies. The issue of International Cooperation has been among the most prominent ones of the FLaReNet project, discussing future policies and priorities in a worldwide context. Together, and under the umbrella of a shared view of today’s priorities, a future can be shaped in which full deployment of Language Resources and Technologies is consolidated through coordination of programs, actions and activities.