We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and LM models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to 86% reduction in the number of OOV tokens and up to 74% reduction in the perplexity. We perform SMT experiments in all language pairs in the
WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development.
Ireland ->
Dublin City University ->
DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing
Ireland ->
Dublin City University ->
DCU Faculties and Centres = Research Initiatives and Centres
Ireland ->
Dublin City University ->
Subject = Computer Science: Information retrieval
Ireland ->
Dublin City University ->
Subject = Computer Science: Machine translating
Ireland ->
Dublin City University ->
DCU Faculties and Centres = DCU Faculties and Schools
Ireland ->
Dublin City University ->
Subject = Computer Science: Computational linguistics
Ireland ->
Dublin City University ->
DCU Faculties and Centres = Research Initiatives and Centres: Centre for Next Generation Localisation (CNGL)
Ireland ->
Dublin City University ->
Subject = Computer Science: Artificial intelligence
Ireland ->
Dublin City University ->
Subject = Computer Science: Algorithms
Ireland ->
Dublin City University ->
DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland ->
Dublin City University ->
Status = Published
Ireland ->
Dublin City University ->
Subject = Computer Science
Ireland ->
Dublin City University ->
Publication Type = Conference or Workshop Item