Project Outputs

A Fala Database (1)
Publications (1)
Experience Sharing (2)

A Fala Database

Name:   A Fala_Database_ver01_Sep2020 (Download)

How to cite the database:

VALEŠ, Miroslav. 2020. A Fala Database: version 01, Sep. 2020. Minde: CIDLeS. Available at:


The following database is the result of the project: Community-Driven Documentation and Description of A Fala carried out at CIDLeS in cooperation with the Technical University of Liberec, Czech Republic. The database contains 225 000 tokens/words documented in 156 texts. The database has been compiled from transcribed recordings, which contributed with 110 315 words (49% of the database), and published and unpublished texts written in one of the varieties of A Fala, which contributed with the remaining 114 690 words (51% of the database). However, due to the copyright issues 10 of the written texts had to be deleted and for that reason this public version has only 146 accessible texts, with over 204 000 words.

The objective of the project was to create a database that would reflect both the spoken and written aspects of the language, taking into account a variety of factors: equal representation of the three varieties (lagarteiru, mañegu and valverdeñu), participation of both genders (women and men), participation of speakers of different age groups, not only the oldest speakers, and a variety of topics to be covered in the interviews ranging from the traditional ones like the local agriculture to European funds and their local usage. The community of speakers contributed to all stages of the database compilation.

Community participation: approx. 175 participants, 4% of the population of the three villages.

Technical requirements

You will need the latest version of FLEx to open the database.

The database is password protected. It is available to everyone, but to get the password, please contact:

Content specifications


Total tokens/words registered: 110 315
Total number of recordings: 63 (in 37 interview sessions)
Total time: 705 min (11hrs 45 min)
Video recordings: 61 (94%)
Audio only recordings: 2 (6%)
Total number of participants: 67 (37 women, 30 men, 20 participants in the position of interviewers with limited participation)

Recordings Lagarteiru

Number of recordings: 16 (in 12 interview sessions)
Time: 238 min (3 hrs 58 min)
Tokens/words registered: 38 709
Participants: 22 (12 women, 10 men, 6 in the position of interviewers)

Recordings Mañegu

Number of recordings: 26 (in 12 interview sessions)
Time: 248 min (4 hrs 8 min)
Tokens/words registered: 37 703
Participants: 19 (11 women, 8 men, 4 in the position of interviewers)

Recordings Valverdeñu

Number of recordings: 21 (in 13 interview sessions)
Time: 219 min (3 hrs 39 min)
Tokens/words registered: 33 903
Participants: 26 (14 women, 12 men, 10 in the position of interviewers)

Written texts

Total tokens/words registered: 114 690
Total number of written texts: 93
Larger texts (books, theatre plays): 6 (49 305 words)
Shorter texts (magazine articles, short stories, etc.): 80 (55 308 words)
Translations: 5 (9 518 words)
Web texts: 1 (408 words)
Public announcements: 1 (151 words)
Total authors: 71
Authors lagarteiru: 33
Authors mañegu: 12
Authors valverdeñu: 26
Texts not available in the public version of the database: 10 (20 443 words)

FLEx specifications

This is the first version of the FLEx database and for that reason there are sections that have not been completed adequately yet. This is the case of sociolinguistic information related to the participants of the recordings. This information will be inserted into the database in the near future. Also, the Lexicon section is under construction and for that reason it will be substantially corrected before the Dictionary publication.

Lexicon – Entry section:

• General note line is used for extended comments on usage as the Usages line only offers pre-defined categories.

• Semantic domains – the only semantic domains that have been marked are related to Animals (1.6), Plants (1.5) and Tools (6.7). The categorization is simplified and it will be a matter of future corrections and completion.

• Restrictions – this line reflects the frequency of words. It is also a section to be completed.

no mark = frequent words (5 000)

A = less frequent words (10 000) (not marked yet)
B = rare words (15 000) (not marked yet)
C = very infrequent words – related to the traditional culture, often unused e.g. corsetería
D = very infrequent words – related to Castilian) e.g. lasaña, paracetamol
E = adverbs in -menti, they will not be part of the dictionary, but they appear in the database
F = words that will not be part of the dictionary or will be inserted after verification

Go up


VALEŠ, Miroslav. 2020. Recopilación de datos primarios para la descripción y documentación de la lengua. Études romanes de Brno, vol. 41, no. 1, pp. 87-98. ISSN 1803-7399 (print), ISSN 2336-4416 (online).

>>1A – ERB Vales_final <<

Go up

Experience sharing

Course: Sociolinguistics and research methodology

This one semester course (28 hours of lectures and seminars) is compulsory subject for all MA students of Spanish, English or German at the Faculty of Science, Arts and Education, Technical University of Liberec. The course reflects the methodological part of the project as the students will learn how to collect linguistic data, process them (ELAN) and create their own database FLEx.

• Lecture: Project of minority language documentation

Date: 8 October, 2020

The objective of the lecture was to share the experience with coleagues, especially those from language departments (English, German, Romance languages), and to motivate them to apply for their own linguistic projects that would support minority languages or linguistics in general. The methodology of the project was discussed in detail, as well as the outputs.

Go up