WP1 Acquisition of parallel and comparable corpora

CorpusLanguages#Words# Sents
BAF/RaliEN - FR~400 K /language
CraterEN - ES - FR1 M
Europarl parallelDE->EN, ES, NL, PT
EN->ES, FR, NL, PT
>44 M/language> 1.9 / pair
Europarl monolingualDE, EN, ES, FR, NL, PT, IT>47 M/language> 2 M/language
JRCA parallelDE->EN, ES, FR, IT, NL, PT
EN->ES, FR, IT, NL
JRCA monolingualDE, EN, ES, FR, IT, NL, PT > 32 M/language> 23K texts/language
NPVCorpDE245K
Opus1DE, ES56K
Reuters (RCV2)DA, DE, EN, ES, FR, IT, JA, NL, NO, PT, RU, SV, ZH487,000 Reuters News stories
UN parallelEN, FR, ES, RU, ZH, AR
WS LeipzigDE, EN, FR