Accueil > Manifestations > Colloques, journées d’étude > Colloques 2017-2018 > Computational Methods for Endangered Language Documentation and (...)

Dernière modification : 15 janvier 2018

Computational Methods for Endangered Language Documentation and Description


Computational Methods for Endangered Language Documentation and Description
Crédits : Pôle communication de l’ENS

February 1st-2nd, 2018

École normale supérieure
Room “Actes”


Organization :

  • Thierry Poibeau (Lattice)
  • Michael Rießler (University of Bielefeld & The Freiburg Research Group in Saami Studies)
  • Niko Partanen (Lattice & The Freiburg Research Group in Saami Studies)



There is a significant gap between digital methods applied in corpus building and corpus exploration for the numerous small and often endangered, low-resource languages compared to the high-resource majority languages. Corpora for endangered minority languages are typically built out of spoken data, which have first to be recorded and transcribed and are therefore relatively small. Majority language corpora, on the other hand are considerably bigger and include predominantly language data from diverse digital (or digitalized) written sources.

Whereas majority language corpus linguists develop and apply Natural Language Processing tools and attempt to automatize the annotation process, usually with the help of manually checked gold corpus, field linguists most typically rely on manual (or occasionally semi-manual) methods during the entire process. In many cases of fieldwork-based endangered language documentation projects, manual methods are in fact the most convenient choice, rather than to start developing computational linguistic resources from scratch. This is especially true if the linguistic structures of the languages in question are yet unknown, there is no established writing system, and the available corpus data are finite and small in quantity.

However, there are also many small or medium-size endangered languages for which the basic grammatical structures have already been described and which have established writing systems. This situation is common in Northern Eurasia, where basically all minority languages are also written today. Still, most of these languages have not been in the focus of computational and corpus linguistic research so far. This is true despite the fact that there are written corpus data of significant size available for several of these languages.

The workshop aims at examining the application of specific methods from Natural Language Processing in order to analyze data from endangered and low-resource languages from Northern Eurasia and other parts of the world. The workshop defines language technologies in a very broad sense and therefore includes also computational methods for signal processing in general, as such technologies can be applied effectively to the work with text corpora linked to multimedia data.

The event will feature a few invited presentations and tutorials. In addition, there will be slots for interested participants to present posters on their own thematic projects. 




  • 09:30–10:00 Opening and Introduction
  • 10:00–11:00 Talk : Joakim Nivre (Uppsala) Universal Dependencies — A framework for morphosyntactic annotation
  • 11:00–11:30 Coffee
  • 11:30–12:15 Talk : Jargal Badagarov (Ulan-Ude) Endangered language documentation in Inner Asia (preliminary title)
  • 12:30–14:00 Lunch
  • 14:00–14:45 Talk : Svetlana Toldova (Moscow) title to be announced
  • 14:45–16:30 Posters and demonstration sessions (with coffee)
  • 16:30–17:15 Talk : Olga Majewska (Cambridge) title to be announced
  • 17:15–18:00 Talk : Francis Tyers (Moscow) Speech synthesis (preliminary title)



  • 09:30–10:15 Talk : Laurent Besacier (CNRS/LIMSI, Paris), Gilles Adda, Martine Adda-Decker, François Yvon, Pierre Godard, Annie Rialland, Sebastian Stueker Breaking the unwritten language barrier : The BULB project
  • 10:15–11:00 Talk : Timofey Arkhangelski (Hamburg) Processing fieldwork data in the Beserman Udmurt documentation project
  • 11:00–12:30 Posters and demonstration sessions (with coffee)
  • 12:30–14:00 Lunch
  • 14:00–15:00 Talk : Trond Trosterud (Tromsø) Language technology meets documentary linguistics : What we have to tell each other
  • 15:00–15:30 Closing


  • The event is free but it is necessary to register in advance. Please send an email with your name and affiliation to :

    For more informations, see the event wesbsite.

    February 1st-2nd, 2018
    ENS, 45 rue d’Ulm, 75005
    Room “Actes” (A stairs, 1st floor)

avril 2024 :

Rien pour ce mois

mars 2024 | mai 2024

haut de page