The French/Spanish parallel Corpus, PaFreS, is part of an ongoing major project, PaCorES, Parallel Corpora Spanish, which aims to collect a series of bilingual parallel corpora with Spanish as the central language. So far German/Spanish (www.corpuspages.eu), English/Spanish (www.corpuspaens.eu) and this one.
The corpus PaFreS is comprised of original texts in French or Spanish and their translation and French and Spanish translations of a third language. So far PaFreS contains some 3,000,000 tokens, segmented into 58,000 bisegments, i.e. sentence or subsentence aligned pairs of text chunks.
We aim at building a multifunctional and representative language resource for the language pair French / Spanish that is able to meet differentiated need of users and that can be exploited for multiple purposes such as general research in contrastive linguistics, linguistic typology, translation studies and bilingual lexicography, as well as the supply of training data to machine translation systems.
Main purpose of the corpus PaFreS is to be a useful and easy to use tool for translators and learners of French or Spanish as Foreign Languages at intermediate and advanced levels. With this tool they can get a multitude of translation suggestions made by humans and presented within examples of real language use.
It includes so far:
Brontë, Charlotte (1847): Jane Eyre.
[Jane Eyre.
Translation: András Farkas]
Review of the alignment: I. Doval. [32001]
Carroll, Lewis (1865): Alice's Adventures in Wonderland.
[Alice au pays des merveilles.
Translation: András Farkas]
Review of the alignment: I. Doval. [32002]
Defoe, Daniel (1719): Robinson Crusoe.
[Robinson Crusoe.
Translation: András Farkas]
Review of the alignment: I. Doval. [32003]
Doyle, Arthur Conan (1902): The Hound of the Baskervilles.
[Le Chien des Baskerville.
Translation: András Farkas]
Review of the alignment: I. Doval. [32004]
Doyle, Arthur Conan (1887): A Study in Scarlet.
[Une étude en rouge.
Translation: András Farkas]
Review of the alignment: I. Doval. [32005]
Poe, Edgar Allan (1839): The Fall of the House of Usher.
[La chute de la maison Usher.
Translation: András Farkas]
Review of the alignment: I. Doval. [32006]
Wilde, Oscar (1890): The Picture of Dorian Gray.
[Le Portrait de Dorian Gray.
Translation: András Farkas]
Review of the alignment: I. Doval. [32007]
Dumas, Alexandre (1844): Les Trois Mousquetaires.
[Los tres mosqueteros.
Translation: András Farkas]
Review of the alignment: I. Doval. [32008]
Verne, Jules (1870): Vingt mille lieues sous les mers.
[Veinte mil leguas de viaje submarino.
Translation: András Farkas]
Review of the alignment: I. Doval. [32009]
Verne, Jules (1875): L'île mystérieuse.
[La isla misteriosa.
Translation: András Farkas]
Review of the alignment: I. Doval. [32010]
Verne, Jules (1759): Le Tour du monde en quatre-vingts jours.
[La vuelta al mundo en 80 días.
Translation: András Farkas]
Review of the alignment: I. Doval. [32011]
Voltaire (1759): Candide, ou l'Optimisme Candide, ou l'Optimisme.
[Cándido o el optimismo.
Translation: András Farkas]
Review of the alignment: I. Doval. [32012]
This is an ongoing project and in the future it is planned to add new collections of bilingual texts diverse origin.
Despite our best efforts, some mistakes have undoubtedly slipped through. If you come across any, please let us know by by clicking here.
Notice:
If you use PaFreS in your work, please indicate it and let us know: corpuspafres@usc.es. This way you contribute to the sustainability of the project.
Statistics PaFreS
COLLECTION | LANGUAGE | TOKENS | WORDS | MSTTRATIO* | BISEGMENTS |
Literature | French | 1.399.200 | 1.151.212 | 0,545 | 57.733 |
Spanish | 1.282.476 | 1.106.637 | 0,537 | ||
Europarl v7 | French | 59.651.196 | 51.954.734 | 0,496 | 1.944.439 |
Spanish | 53.583.854 | 48.664.574 | 0,482 | ||
TED-Talks | French | 5.197.553 | 4.332.950 | 0,496 | 254.222 |
Spanish | 4.686.514 | 4.062.259 | 0,504 | ||
Global Voices | French | 1.179.414 | 1.016.690 | 0,539 | 50.270 |
Spanish | 1.097.488 | 985.542 | 0,553 | ||
Total | French | 67.427.363 | 58.455.586 | 0,523 | 2.306.664 |
Spanish | 60.650.332 | 54.819.012 | 0,515 |
*MSTTR is the average TTR (Type/Token Ratio) for each non-overlapping segment of equal size (in this case 1000 tokens).