Preview

Herald of the Kazakh-British Technical University

Advanced search

KAZAKH NAMES GENERATOR USING DEEP LEARNING

Abstract

In recent years, sentiment analysis of e-mail messages or social media posts is becoming very popular. It can help people define if they are reading something positive or negative. On the same time, there are some services on the Internet that can help you find or create a new name. When processing the creation, they check the name in other popular languages, so your name does not mean inappropriate things in other languages. For this they bill for 25 thousand US dollars. If there are such services, then there is a demand. In this study, sentiment analysis of e-mails was implemented with using StanfordNLP [1] lemmatizer and classic machine learning algorithms as a classifier. It is applied to real e-mails from Russian speaking mailbox, which means there are both English and Russian messages. Thus, language identification is also added as preprocessing step. In this study only binary sentiment analysis was made, but it can be improved with adding several emotions to be detected. Then another model generates Kazakh names using neural networks, where all Kazakh names data has been collected through various websites. The sentiment analysis model gives 81% accuracy and the joint use of two models allow us to generate new Kazakh names, which are checked with Russian language if they mean something inappropriate. The result can be improved with checking with other languages.

About the Authors

D. Nurmambetov
Süleyman Demirel University
Kazakhstan


S. Dauylov
Süleyman Demirel University
Kazakhstan


A. Bogdanchikov
Süleyman Demirel University
Kazakhstan


References

1. Peng Qi, Timothy Dozat, Yuhao Zhang and Christopher D. Manning. 2018. Universal Dependency Parsing from Scratch in Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 160-170

2. Нұргүл Абай. Балаға еңжиіқойылатын ТОП-20 есімніңмағынасы немесе атқоярда нені ұмыт- паған жөн. Sputniknews.kz. Nov 25, 2018. https://sputniknews.kz/society/20181013/7589294/bala-esim-top-20.html

3. Накипов Мұхамедәлі Асанұлы. Қазақша есімдердің тізімі. Bilim-All.kz. March 12, 2018. https://bilim-all.kz/esimder/all

4. Айнаш Ануарбек. Қазақша қыз есімдері мен олардың мағынасы. April 11, 2017. Yvision.kz. https://yvision.kz/post/763198

5. Stan.kz. Қазақы есімдер. Ұлыңызға қандай есім бердіңіз. Stan.kz. May 12, 2018. https://stan.kz/kazaky-esimder-ulynyzga-kanday-esim-b/

6. Erik Tromp; Mykola Pechenizkiy, “SentiCorr: Multilingual Sentiment Analysis of Personal Correspondence”, 2011 IEEE 11th International Conference on Data Mining Workshops, 2011.

7. R. Miller; E.Y.A. Charles, “A psychological based analysis of marketing email subject lines”, 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer), 2016.

8. Muhammad Babar Abbas; Mukarram Khan, “Sentiment Analysis for Automated Email Response System”, 2019 International Conference on Communication Technologies (ComTech), 2019

9. Xiaopeng Yang, Xiaowen Lin, Shunda Suo, Ming Li. Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders. Arxiv Sanity Preserver. 5 Mar 2020. https://arxiv.org/abs/1711.07632v4

10. Анна Слёз. Как выбрать имя ребенку. Koloro brand Design Blog. Dec 4, 2019. https://koloro.ua/blog/brending-i-marketing/sozdanie-imeni-rebenky.html

11. Port of Nakatani Shuyo's language-detection library, Feb 16, 2020 https://pypi.org/project/langdetect/

12. Steven Loria, TextBlob: Simplified Text Processing, April 26, 2020. https://textblob.readthedocs.io/en/dev/

13. Pratima Upadhyay, Removing stop words with NLTK in Python, March 30, 2017. https://www.geeksforgeeks.org/removing-stop-words-nltk-python/

14. Mohamed Afham, “Twitter Sentiment Analysis using NLTK, Python”, towardsdatascience, 2019

15. OLEG YEGOROV, “Why do Russians use parentheses instead of smileys?”, RBTH, 2017. Available: https://www.rbth.com/lifestyle/326858-why-russians-use-parentheses

16. Jeff Hale, Scale, Standardize, or Normalize with Scikit-Learn, Mar 4, 2019. https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02

17. A Ydobon, How to interpret a Classification Report, Jan 25, 2020. https://medium.com/@a.ydobon/justforfunpython-how-to-interpret-a-classification-report-189edc487460

18. Abhishek Sharma, Confusion Matrix in Machine Learning, Dec 13, 2019. https://www.geeksforgeeks.org/confusion-matrix-machine-learning/


Review

For citations:


Nurmambetov D., Dauylov S., Bogdanchikov A. KAZAKH NAMES GENERATOR USING DEEP LEARNING. Herald of the Kazakh-British Technical University. 2020;17(4):171-177.

Views: 2149


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1998-6688 (Print)
ISSN 2959-8109 (Online)