Project Outputs

A Fala Dictionary (2)
A Fala Database (1)
Publications (1)
Experience Sharing (2)

A Fala Dictionary

Paper version

Name:   A_Fala_Diccionariu_version_papel (Download)

How to cite the dictionary:

VALEŠ, Miroslav. 2021. Diccionariu de A Fala: lagarteiru, mañegu, valverdeñu. Minde: CIDLeS. Available at:


The most remarkable feature of this dictionary is the respect for the language diversity, documenting the three varieties of A Fala without imposing one as a model. Its methodology is based on primary data collected in the three localities in cooperation with the speech community.

• It is derived from a database containing 225,000 words.
• It registers more than 13,000 entries.
• It includes more than 9,000 entries of each variety.
• The words that do not have a direct translation into Spanish are explained with a definition.
• It provides information on the use of words.

Web version

Name:   A_Fala_Diccionariu_version01_web (Download)

How to cite the dictionary:

VALEŠ, Miroslav. 2021. Diccionariu de A Fala: lagarteiru, mañegu, valverdeñu; web version 01, Sep. 2021. Minde: CIDLeS. Available at:


The information included in the web version01 is identical to the paper version, however you can find here more pictures and a few videos. This version is easy to update and for this reason we would appreciate your corrections, missing words and complementary material (pictures or videos) to improve the future versions. Please, send your comments to:

Go up

A Fala Database

Name:   A_Fala_Database_ver02_Sep2021 (Download)

How to cite the database:

VALEŠ, Miroslav. 2021. A Fala Database: version 02, Sep. 2021. Minde: CIDLeS. Available at:


The following database is the result of the project: Community-Driven Documentation and Description of A Fala carried out at CIDLeS in cooperation with the Technical University of Liberec, Czech Republic. The database contains 225 000 tokens/words documented in 156 texts. The database has been compiled from transcribed recordings, which contributed with 110 315 words (49% of the database), and published and unpublished texts written in one of the varieties of A Fala, which contributed with the remaining 114 690 words (51% of the database). However, due to the copyright issues 6 of the written texts had to be deleted and for that reason this public version has only 150 accessible texts, with over 222 900 words.

The objective of the project was to create a database that would reflect both the spoken and written aspects of the language, taking into account a variety of factors: equal representation of the three varieties (lagarteiru, mañegu and valverdeñu), participation of both genders (women and men), participation of speakers of different age groups, not only the oldest speakers, and a variety of topics to be covered in the interviews ranging from the traditional ones like the local agriculture to European funds and their local usage. The community of speakers contributed to all stages of the database compilation.

Community participation: approx. 180 participants, 4% of the population of the three villages.

Previous versions of the database

A Fala_Database_ver01_Sep2020 (Download)

Technical requirements

You will need the latest version of FLEx to open the database.

The database is password protected. It is available to everyone, but to get the password, please contact:

Content specifications


Total tokens/words registered: 110 315
Total number of recordings: 63 (in 37 interview sessions)
Total time: 705 min (11hrs 45 min)
Video recordings: 61 (94%)
Audio only recordings: 2 (6%)
Total number of participants: 67 (37 women, 30 men, 20 participants in the position of interviewers with limited participation)

Recordings Lagarteiru

Number of recordings: 16 (in 12 interview sessions)
Time: 238 min (3 hrs 58 min)
Tokens/words registered: 38 709
Participants: 22 (12 women, 10 men, 6 in the position of interviewers)

Recordings Mañegu

Number of recordings: 26 (in 12 interview sessions)
Time: 248 min (4 hrs 8 min)
Tokens/words registered: 37 703
Participants: 19 (11 women, 8 men, 4 in the position of interviewers)

Recordings Valverdeñu

Number of recordings: 21 (in 13 interview sessions)
Time: 219 min (3 hrs 39 min)
Tokens/words registered: 33 903
Participants: 26 (14 women, 12 men, 10 in the position of interviewers)

Written texts

Total tokens/words registered: 114 690
Total number of written texts: 93
Larger texts (books, theatre plays): 6 (49 305 words)
Shorter texts (magazine articles, short stories, etc.): 80 (55 308 words)
Translations: 5 (9 518 words)
Web texts: 1 (408 words)
Public announcements: 1 (151 words)
Total authors: 71
Authors lagarteiru: 33
Authors mañegu: 12
Authors valverdeñu: 26
Texts not available in the public version of the database: 6 (2 094 words)

FLEx specifications

This is the second version of the FLEx database and there are still sections that have not been completed adequately yet. This is the case of sociolinguistic information related to the participants of the recordings. This information will be inserted into the database in one of the following versions.

Lexicon – Entry section:

• General note line is used for extended comments on usage as the Usages line only offers pre-defined categories.

• Semantic domains – the only semantic domains that have been marked are related to Animals (1.6), Plants (1.5) and Tools (6.7). The categorization is simplified and it will be a matter of future corrections and completion.

• Restrictions – this line reflects the frequency of words. It is also a section to be completed.

no mark = frequent words (5 000)

A = less frequent words (10 000) (not marked yet)
B = rare words (15 000) (not marked yet)
C = very infrequent words – related to the traditional culture, often unused e.g. corsetería
D = very infrequent words – related to Castilian) e.g. lasaña, paracetamol
E = adverbs in -menti, they will not be part of the dictionary, but they appear in the database
F = words that will not be part of the dictionary or will be inserted after verification

Go up


VALEŠ, Miroslav. 2020. Recopilación de datos primarios para la descripción y documentación de la lengua. Études romanes de Brno, vol. 41, no. 1, pp. 87-98. ISSN 1803-7399 (print), ISSN 2336-4416 (online).

>>1A – ERB Vales_final <<

Go up

Experience sharing

Course: Sociolinguistics and research methodology

This one semester course (28 hours of lectures and seminars) is compulsory subject for all MA students of Spanish, English or German at the Faculty of Science, Arts and Education, Technical University of Liberec. The course reflects the methodological part of the project as the students will learn how to collect linguistic data, process them (ELAN) and create their own database FLEx.

• Lecture: Project of minority language documentation

Date: 8 October, 2020

The objective of the lecture was to share the experience with coleagues, especially those from language departments (English, German, Romance languages), and to motivate them to apply for their own linguistic projects that would support minority languages or linguistics in general. The methodology of the project was discussed in detail, as well as the outputs.

Go up