Hybrid High Quality Translation System

We design and  implement a hybrid architecture  for high quality machine  translation (HyghTra)  which  combines  the  strengths  of  the  statistical  and  the  rule-based approach and minimizes their weaknesses. 


HyghTra  will  consist  of  a  rule-based  MT  core  system  which  provides morphology, declarative grammars, semantic categories, and small (cheap) bilingual dictionaries,  and  which  omits  all  kinds  of  (expensive)  disambiguating  preference knowledge.  Instead of compiling such knowledge and working out  large dictionaries manually,  we  make  use  of  a  bootstrapping  method  for  automatically  extending dictionaries  and  for  training  the  analytical  performance  and  the  choice  of  transfer alternatives, using monolingual and bilingual corpora. 


Since  bilingual  data  with  good  literal  translations  are  sparse,  we  focus  in particular on searching monolingual corpora  for new words and use  the statistically tuned  analysis  components  of  the  system  and  similarity  assumptions  to  crosslinguistically  relate  them  to  each  other.  This  should  overcome  the  data  acquisition bottleneck of conventional SMT to a significant degree.

Project Participants

University of Leeds           Logo of Lingenio GmbH




Funded by
the Seventh Framework Programme
of the European Union

Project details

Research area: FP7-PEOPLE-2009-IAPP Marie Curie IAPP transfer of knowledge programme
Project Acronym: HYGHTRA
Project Reference: 251534
Start Date: 2010-12-01
Duration: 48 months
Contract Type: Industry-Academia Partnerships and Pathways (IAPP)
End Date: 2014-11-30
Project Status: Execution

 
    FP7 reference number: 251534