Sign language corpora

A corpus is a representative collection of samples of a language in machine-readable format, used to study the type and frequency of linguistic units. In addition, it constitutes a broad representation of the language and its geographical, register and generational variants. As for sign language corpora, they are characterized by being collections of annotated videos that contain written material aligned with the main data in the corresponding sign language. They also constitute a representative sample of the language.

The main benefit of this type of corpus is to preserve signed language as an important part of the social and linguistic heritage of a society. It is important to point out that the initiative that we present here has the valuable precedent of similar projects for the elaboration of corpora of other European sign languages that are faced with comparable deficiencies. Thus, in the Netherlands, the United Kingdom, Australia, Germany, Ireland and Italy, corpus projects have already been established for the respective sign languages of the country and are in the construction, annotation, and finalization phase, depending on the case. The experience accumulated in these projects, to which we have access through existing collaborations with some of the directors and coordinators, allows us to advance even more solidly and efficiently in the constitution of the LSC Corpus based on reliable criteria.

Corpora allow us to collect the lexical units of a language. The grammatical properties of the lexical units they contain can be obtained from them, since they provide us with information about the syntactic context, phonology, morphology, semantics and pragmatics of these units. All this information can be collected in a lexical database to classify and organize it. Some of the corpus projects are already advanced enough to have been able to be linked to a lexical database. Not all databases show the same type of information. Some databases contain the definition, phonology, morphology and even pragmatic aspects of the signs, while others are in a more initial stage and they only show how the sign is articulated and the corresponding gloss.

British Sign Language Corpus (BSL)

-Sign Language of the Netherlands Corpus (NGT)

-Australian Sign Language Corpus (AUSLAN)

German Sign Language Corpus (DGS)

-French Belgian Sign Language Corpus (LSBF)

-Polish Sign Language Corpus (PJM)

-Spanish Sign Language Corpus (LSE)

American Sign Language Database (ASL)