The project
Sign language corpora
A corpus is a representative collection of samples of a language in machine-readable format, used to study the type and frequency of linguistic units. In addition, it constitutes a broad representation of the language and its geographical, register and generational variants. As for sign language corpora, they are characterized by being collections of annotated videos that contain written material aligned with the main data in the corresponding sign language. They also constitute a representative sample of the language.
The LSC corpus
In 2007, the Institut d’Estudis Catalans, the Federació de Persones Sordes de Catalunya, the Universitat Pompeu Fabra, the Fundació Barcelona Media and Linguamón undertook a collaborative initiative to create a reference corpus of LSC. However, at that time, the lack of financing did not allow to carry out the project. In 2012, the Institut d’Estudis Catalans offered the possibility of initiating a first corpus constitution project with a preparatory phase and a pilot test, which was possible thanks to the support of the Government of Catalonia’s Directorate for Language Policy ( Departament de Plolítica Lingüística) and funding from Obra Social “La Caixa”. One year after starting the pilot project, it became clear that the LSC corpus project was possible, and the pilot project became the LSC Corpus Project. Since then, and thanks to the continued support of the Government of Catalonia’s Directorate for Language Policy ( Departament de Política Lingüística de Catalunya) and the funding from Obra Social “La Caixa”, we have been able to record signers from all over the LSC area.