Research on Process of the Bilingual Corpus Alignment Tool
Abstract
As language researchers and translators gradually realize the importance of the development of corpus, many institutions at home and abroad have begun to devote themselves to the research and construction of corpus. During the process of corpus, bilingual corpus alignment is an indispensable step. However, the alignment software based on the existing alignment technology still can’t meet the needs of users or translators. On the basis of previous studies, this paper mainly makes some beneficial attempts on the alignment process of sentence level automatic alignment technology in bilingual corpus. In this paper, Chinese and English files are imported into the corpus alignment tools and aligned one by one according to the translation units of sentence level. This paper presents the process of corpus alignment and proposes corresponding solutions to the errors in the process, to further improve the efficiency of bilingual corpus alignment.References
[1] Baker M. 1995. Corpus in translation studies:An overview and some suggestions for future research [M]. Target: 230-236.
[2] Zhao Xiaoman. 2010 sentence-Level Alignment of English-Chinese Parallel corpus and its Application in Machine Translation [C]. Anhui University: 1.
[3] Christopher C. Yang *, Kar Wing Li. 2003. Building parallel corpora by automatic title alignment using length-based and text-based approaches [J]. Information Processing & Management: 2.
[4] Simard, M., Foster, G., & Isabelle, P. 1992. Using cognates to align sentences in bilingual corpora [J]. In Fourth international conference on theoretical and methodological issues in machine translation (TMI-92), Montreal, Canada: 1072.
[5] André Santos. 2011. A survey on parallel corpora alignment [J]. MI-STAR: 122.
[6] Tiedemann, J. 2010. Lingua-Align: An Experimental Toolbox for Automatic Tree-to-Tree Alignment[J]. Department of Linguistics and Philology Uppsala University, Uppsala/Sweden: 742.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution-Noncommercial 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit, that the work is not used for commercial purposes, and that in the event of reuse or distribution, the terms of this license are made clear. With this license, the authors hold the copyright without restrictions and are allowed to retain publishing rights without restrictions as long as this journal is the original publisher of the articles.