Type

Journal Article

Authors

Ergun Bicici

Subjects

Linguistics

Topics
machine translating statistical mt domain adaptation computational linguistics random selection machine translation information retrieval training

Domain adaptation for machine translation with instance selection (2015)

Abstract Domain adaptation for machine translation (MT) can be achieved by selecting training instances close to the test set from a larger set of instances. We consider 7 different domain adaptation strategies and answer 7 research questions, which give us a recipe for domain adaptation in MT. We perform English to German statistical MT (SMT) experiments in a setting where test and training sentences can come from different corpora and one of our goals is to learn the parameters of the sampling process. Domain adaptation with training instance selection can obtain 22% increase in target 2-gram recall and can gain up to 3.55 BLEU points compared with random selection. Domain adaptation with feature decay algorithm (FDA) not only achieves the highest target 2-gram recall and BLEU performance but also perfectly learns the test sample distribution parameter with correlation 0.99. Moses SMT systems built with FDA selected 10K training sentences is able to obtain $F_1$ results as good as the baselines that use up to 2M sentences. Moses SMT systems built with FDA selected 50K training sentences is able to obtain 1 F1 point better results than the baselines.
Collections Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: Centre for Next Generation Localisation (CNGL)
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres
Ireland -> Dublin City University -> Subject = Computer Science: Information retrieval
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> Subject = Computer Science
Ireland -> Dublin City University -> Publication Type = Article
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools
Ireland -> Dublin City University -> Subject = Computer Science: Computational linguistics

Full list of authors on original publication

Ergun Bicici

Experts in our system

1
Ergun Bicici
Dublin City University
Total Publications: 12