Citation: | Huu-anh Tran, Yuhang Guo, Ping Jian, Shumin Shi, Heyan Huang. Improving Parallel Corpus Quality for Chinese-Vietnamese Statistical Machine Translation[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2018, 27(1): 127-136.doi:10.15918/j.jbit1004-0579.201827.0116 |
[1] |
Melamed D I. Models of translational equivalence among words[J]. Computational Linguistics,2000,26(2):221-249.
|
[2] |
Jin R, Chai J Y. Study of cross lingual information retrieval using on-line translation systems[C]//International ACM Sigir Conference on Research & Development in Information Retrieval,2005:619-620.
|
[3] |
Gale W A, Church K W. Identifying word correspondences in parallel texts[C]//Speech and Natural Language, Proceedings of a Workshop Held at Pacific Grove, California, USA, DBLP, 1991:152157.
|
[4] |
Widdows D, Dorow B, Chan C K. Using parallel corpora to enrich multilingual lexical resources[C]//International Conference on Language Resources & Evaluation, 2002:240-245.
|
[5] |
Kuhn J. Experiments in parallel-text based grammar induction[C]//Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 2004:470. https://dl.acm.org/citation.cfm?id=1218955.1219015.
|
[6] |
Do Thi-Ngoc-Diep, Le Viet-Bac, Bigi Brigitte,et al. Mining a comparable text corpus for a VietnameseFrench statistical machine translation system[C]//Proceedings of the Fourth Workshop on Statistical Machine Translation, Association for Computational Linguistics, 2009:165-172. https://dl.acm.org/citation.cfm?id=1626466.
|
[7] |
Luo L, Guo J Y, Yu Z T, et al. Construction of a large-scale Sino-Vietnamese bilingual parallel corpus[C]//IEEE International Conference on System Science and Engineering, IEEE, 2014:154-157.
|
[8] |
Le Q H, Le A C. Extracting parallel texts from the web[C]//Second International Conference on Knowledge and Systems Engineering, IEEE, 2010: 147-151.
|
[9] |
Gale W A, Church K W. A program for aligning sentences in bilingual corpora[C]//Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 1991:177-184.
|
[10] |
Moore R C. Fast and accurate sentence alignment of bilingual corpora[C]//Conference of the Association for Machine Translation in the Americas on Machine Translation:From Research to Real Users, Springer-Verlag, 2002:135-144.
|
[11] |
Braune F, Fraser A. Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora[C]//COLING 2010, International Conference on Computational Linguistics, Posters Volume, 23-27 August 2010, Beijing, China, DBLP, 2010:81-89.
|
[12] |
Sennrich R, Volk M. MT-based sentence alignment for OCR-generated parallel texts[C]//Proc of Amta, 2010:175-182.
|
[13] |
Tiedemann J. Building a multilingual parallel subtitle corpus[J]. International Journal of Multilingualism, 2009, 11(2):266-268.
|
[14] |
Tiedemann J. Synchronizing translated movie subtitles[C]//International Conference on Language Resources and Evaluation, Lrec 2008, 26 May-1 June 2008, Marrakech, Morocco, DBLP, 2012: 1902-1906.
|
[15] |
Skadinš R, Tiedemann J, Rozis R, et al. Billions of parallel words for free:building and using the EU bookshop corpus[C]//Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014:18501855. https://pdfs.semanticscholar.org/359a/f0607033b19e91c5f07715d16a0f2efff85f.pdf.
|
[16] |
Lison Pierre, Jörg Tiedemann. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles[C]//LREC, 2016. https://www.duo.uio.no/handle/10852/50459.
|
[17] |
Chen B, Cherry C. A systematic comparison of smoothing techniques for sentence-level BLEU[C]//The Workshop on Statistical Machine Translation, 2014:362-367.
|
[18] |
Koehn Philipp Hoang. Moses:open source toolkit for statistical machine translation[C]//Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Association for Computational Linguistics, 2007: 177-180.
|