WP5 Improve dictionary coverage for multi-word units


In order to account for technical terms, idioms, collocations, and typical short phrases, an important feature of an MT lexicon is a high coverage of multiword units. Very recent work conducted at the Universities of Leeds and Lancaster shows that dictionary entries for such multiword units can be derived from comparable corpora if a dictionary of single words is available. It could even be shown that this methodology can be superior to deriving multiword-units from parallel corpora. We will specify and implement an algorithm which computes multi-word translation relations from comparable corpora.

Here some early results from parallel corpora (Europarl):