Preview

Herald of the Kazakh-British Technical University

Advanced search

NEURAL MACHINE TRANSLATION FOR ENGLISH-KAZAKH LANGUAGE PAIR

https://doi.org/10.55452/1998-6688-2025-22-2-54-66

Abstract

Currently, information technology is rapidly developing and one of its branches can be called machine translation. The use of machine translation in the process of understanding each other by people from different countries is increasing every year. At the moment, Google and Yandex machine translations are among the best machine translations. The quality of machine translation from Yandex and Google is improving every year. However, according to the results of the experiment, when translating from English or Russian into Kazakh and Turkic languages, the quality of the translation decreases. This was shown by the translation result obtained from these two machine translations in March 2024. After all, translation has also shown that it is directly related to the structure of language. Since 2000, scientists from the state of Kazakhstan have been actively studying translations into the Kazakh language. The goal of the work is to improve the quality of translation from English into Kazakh. For this purpose, a transforming model was created for the Kazakh and Turkic languages for learning translation in neural machine translation OpenNMT(). The created model studied and learned an English-Kazakh parallel corpus of 180,000 words. Later, the document with a structure of 20,000 different English sentences was translated into Kazakh. The result is measured using the Blue() metric. The translation result showed a high level. It is shown that in order to improve the results of the experiment carried out in the work during model training, it is necessary to increase the number of parallel corpora created from the English-Kazakh language pair.

About the Authors

D. Rakhimova
Al-Farabi Kazakh National University
Kazakhstan

 PhD 

Almaty 



A. Zhiger
Al-Farabi Kazakh National University; Narxoz University
Kazakhstan

 PhD 

Almaty



V. Malykh
Saint Petersburg State University of Information Technologies, Mechanics and Optics
Russian Federation

 PhD 

 Saint Petersburg



V. Karyukin
Al-Farabi Kazakh National University
Kazakhstan

 PhD 

Almaty



A. Bekarystankyzy
Narxoz University
Kazakhstan

PhD 

Almaty 



References

1. Rakhimova D.R., Zhunusova A.Zh. Post-editing for the Kazakh Language Using OpenNMT // Journal of Mathematics, Mechanics and Computer Science. – 2022. – Vol. 113. – No. 1. – P. 118–122. https://doi.org/10.26577/JMMCS.2022.v113.i1.12.

2. Zhumanov Z.M., Tukeyev U.A. Development of Machine Translation Software Logical Model (Translation from Kazakh into English Language) // Reports of the Third Congress of the World Mathematical Society of Turkic Countries, edited by Bakhytzhan T. Zhumagulov. – 2009. – Vol. 1. – pp. 356–363.

3. Tukeyev U., Zhumanov Zh., Rakhimova D. Features of Development for Natural Language Processing: ICT - From Theory to Practice. – edited by M. Milos. – Polish Information Processing Society, 2010. – pp. 149–174.

4. Tukeyev U., Rakhimova D. Augmented Attribute Grammar in Meaning of Natural Languages Sentences. Proceedings of the 6th International Conference on Soft Computing and Intelligent Systems, and the 13th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2012, Kobe, Japan, 2012, pp. 1080–1085.

5. Abeustanova A., Tukeyev U. Automatic Post-editing of Kazakh Sentences Machine Translated from English // Advanced Topics in Intelligent Information and Database Systems: ACIIDS 2017. – Springer, 2017. – Vol. 710. – P. 283–295.

6. Schuster Sebastian, Ranjay Krishna, Angel Chang, Li Fei-Fei, and Christopher D. Manning. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval // Proceedings of the International Conference on Vision and Language (VL). – 2015/ – P. 70–80.

7. Xu Kelvin, et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv, 2015, arXiv:1502.03044.

8. Shormakova A., Zhumanov Zh., Rakhimova D. Post-editing of Words in Kazakh Sentences for Information Retrieval // Journal of Theoretical and Applied Information Technology. – 2019. – Vol. 97. – No. 6. – P. 1896–1908.

9. Turganbayeva A., Tukeyev U. The Solution of the Problem of Unknown Words Under Neural Machine Translation of the Kazakh Language // Journal of Information and Telecommunication. – 2020. – P. 214–225.

10. Tukeyev U., Karibayeva A., Zhumanov Z. Morphological Segmentation Method for Turkic Language Neural Machine Translation // Cogent Engineering. – 2020. –Vol. 7. – No. 1. – P. 1–16. https://doi.org/10.1080/23311916.2020.1780271.

11. Koehn Philipp, Rebecca Knowles. Six Challenges for Neural Machine Translation // Proceedings of the First Workshop on Neural Machine Translation. – 2017. – P. 28–39.

12. Koehn Philipp. Statistical Machine Translation. Draft of Chapter 13. Neural Machine Translation. arXiv, 2017, arXiv:1709.07809v1[cs.CL], 117.

13. Papineni Kishore, Salim Roukos, Todd Ward, Wei-Jing Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation // Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). – Philadelphia, July 2002. – P. 311–318.

14. Alvarez-Melis D., Jaakkola T.S. A Causal Framework for Explaining the Predictions of Black-Box Sequence-to-Sequence Models //Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, Sept. 9-11, 2017, pp. 412–421.

15. Zhou Qingyu, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, and Tiejun Zhao. Neural Document Summarization by Jointly Learning to Score and Select Sentences // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). – 2018. – Vol. 1. – P. 654–663.

16. Shah Ronak, Manish Kumar Gupta, Ajai Kumar. Ancient Sanskrit Line-Level OCR Using OpenNMT Architecture. Proceedings of the 2021 Sixth International Conference on Image Information Processing (ICIIP), 2022, pp. 347–352. https://doi.org/10.1109/ICIIP53038.2021.9702666.

17. Hao L., Gao W., Fang J. High-Performance English-Chinese Machine Translation Based on GPUEnabled Deep Neural Networks with Domain Corpus // Applied Sciences. – 2021. – Vol. 11. – No. 22. – P. 10915. https://doi.org/10.3390/app112210915.

18. Quadri, Mohatesham Pasha, Pradeep Kumar. Corpus-Based Machine Translation for English to LowResource Language Using OpenNMT // Innovative Computing and Communications. – 2024. – P. 199–217.

19. Senellart Jean, Dakun Zhang, Bo Wang, Guillaume Klein, Jean-Pierre Ramatchandirin, Josep Crego, and Alexander Rush. OpenNMT System Description for WNMT 2018: 800 Words/Sec on a Single-Core CPU. Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, Association for Computational Linguistics, 2018, pp. 122–128. https://doi.org/10.18653/v1/W18-2715

20. Klein Guillaume, Francois Hernandez, Vincent Nguyen, and Jean Senellart. "The OpenNMT Neural Machine Translation Toolkit: 2020 Edition. Proceedings of the 14th Conference of the Association for Machine Translation in the Americas, vol. 1, MT Research Track, October 6–9, 2020, pp. 102–109.


Review

For citations:


Rakhimova D., Zhiger A., Malykh V., Karyukin V., Bekarystankyzy A. NEURAL MACHINE TRANSLATION FOR ENGLISH-KAZAKH LANGUAGE PAIR. Herald of the Kazakh-British Technical University. 2025;22(2):54-66. (In Kazakh) https://doi.org/10.55452/1998-6688-2025-22-2-54-66

Views: 22


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1998-6688 (Print)
ISSN 2959-8109 (Online)