SYSTEMATIC REVIEW AND ANALYSIS OF THE PECULIARITIES OF IDENTIFICATION BY VOICE
Abstract
Voice identification is the process of identifying a speaker by a given utterance by comparing voice biometrics of a utterance with those utterance models that were saved in advance. Voice identification technologies have gained a new direction due to advances in artificial intelligence and are widely used in various fields. Character extraction is one of the most important aspects of voice identification, which significantly affects the identification process and performance. This systematic review is conducted to identify, compare and analyze various approaches, methods and algorithms for extracting features for voice identification, to provide background information on character retrieval approaches for voice identification applications and future research. The study examined models: based on patterns, based on vector quantization, dynamic time transfer, histogram model, stochastic models, Gaussian mixture models and hidden Markov model, based on Mel-frequency cepstral coefficients, generative or vector quantization, discriminatory models (usually using machine learning methods such as SVM and ANN).This study showed that the current trend of identification research is to develop a robust, universal voice identification structure for solving important voice identification problems, such as adaptability, complexity, multilingual recognition, and resistance to noise. The results presented in this study are based on past publications, quotations and the number of implementations, the quotes being the most relevant. This article also presents the general process of voice identification.
About the Authors
O. MamyrbayevKazakhstan
A. S. Kydyrbekova
Kazakhstan
А. Akhmediyarova
Kazakhstan
M. Turdalyuly
Kazakhstan
N. Mekebayev
Kazakhstan
References
1. A. Jain L.Hang and S. Pankanti. “Can multi-biom etrics im prove perform ance,” Proceedings of Auto ID, 59-64, 1999.
2. Dutta, M., Patgiri, C., Sarma, M., & Sarma, K. K. (2015). Closed-set text-independent speaker identification system using multiple ANN classifiers. In Proceedings o f the 3rd international conference on frontiers o f intelligent computing: Theory and applications (FICTA) 2014 (pp. 377-385).
3. Islam, M. R., & Rahman, M. F. (2009). Im provem ent o f text dependent speaker identification system using neuro-genetic hybrid algorithm in office environmental conditions. International Journal o f Com puter Science Issues, 1, 42-48.
4. Kekre, H. B., Athawale, A., & Desai, M. (2011). Speaker identification using row mean vector of spectrogram. In Proceedings o f the international conference and workshop on emerging trends in technology (pp. 171-174).
5. Boujelbene, S. Z., M ezghanni, D. B. A., & Ellouze, N. (2009). Robust text independent speaker identification using hybrid GM M -SVM System. International Journal of Digital Content Technology and its Applications, 3, 103-110.
6. Revathi, A., & Venkataramani, Y (2009). Text independent com posite speaker identification / verification using multiple features. In 2009 W RI W orld congress on com puter science and inform ation engineering: 7 (pp. 257-261).
7. Verma, G. K. (2011). M ulti-feature fusion for closed set text independent speaker identification. In International conference on inform ation intelligence, systems, technology and m anagem ent (pp. 170-179).
8. Richardson, F., Reynolds, D., & Dehak, N. (2015a). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22, 1671-1675.
9. Farrell, K. R., M ammone, R. J., & Assaleh, K. T. (1994). Speaker recognition using neural net works and conventional classifiers. IEEE Transactions on speech and audio processing, 2, 194-205.
10. Hong Kong, China: IEEE. Larcher, A., Lee, K. A., Ma, B., & Li, H. (2014). Text-dependent speak er verification: Classifiers, databases and RSR2015. Speech communication, 60, 56-77.
11. Lippmann, R. P. (1989). Review o f neural networks for speech recognition. Neural computation, 1, 1-38.
12. Sidorov, M., Schmitt, A., Zablotskiy, S., & M inker, W. (2013). Survey o f automated speaker identification methods. In 2013 9th international conference on intelligent environm ents (IE) (pp. 236-239).
13. Disken, G., Tufeksi, Z., Saribulut, L., & Cevik, U. (2017). A review on feature extraction for speaker recognition under degraded conditions. IETE Technical Review, 34, 321-332.
14. Rao, K. S., & Sarkar, S. (2014a). Robust speaker verification: A review. In Robust speaker recognition in noisy environm ents (pp. 13-27).
15. Chavan, M., & Chougule, S. (2012). Speaker features and recognition techniques: A review. International Journal of Computational Engineering Research, 2, 720-728.
16. S. S. Tirumala et al. / Expert Systems With Applications 90 (2017) 250-271.
17. Nagaraja, B. G., & Jayanna, H. S. (2012). M ultilingual speaker identification with the constraint of limited data using multitaper MFCC. In International conference on security in com puter networks and distributed systems (pp. 127-134).
18. Lawson, A., Vabishchevich, P., Huggins, M., Ardis, P., Battles, B., & Stauffer, A. (2011). Survey and evaluation of acoustic features for speaker recognition. In Acoustics, speech and signal processing (ICASSP), 2011 ieee international conference on (pp. 5444-5447).
19. Daoudi, K., Jourani, R., Andre, O. R. e. g., & Aboutajdine, D. (2011). In Speaker identification using discriminative learning of large margin GMM: 6 (pp. 300-307).
20. Shih, P.-Y., Lin, P.-C., Wang, J.-F., & Lin, Y.-N. (2011). Robust several-speaker speech recognition with highly dependable online speaker adaptation and identification. Journal of network and computer applications, 34, 1459-1467.
21. Jiang, S., Frigui, H., & Calhoun, A. W. (2015). Speaker identification in medical simulation data using fisher vector representation. In 2015 IEEE 14th international conference on machine learning and applications (iCM LA) (pp. 197-201).
22. Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., & Vinyals, O. (2012). Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20, 356-370 6.
23. Poignant, J., Besacier, L., & Quenot, G. (2015). Unsupervised speaker identification in TV broad cast based on written names. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 57-68.
24. Jin, Q., Toth, A. R., Schultz, T., & Black, A. W. (2009). Speaker deidentification via voice transformation (pp. 529-533).
25. Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., & Mihelic, F. (2015). In Speaker de-identification using diphone recognition and speech synthesis: 4 (pp. 1-7).
26. Pobar, M., & Ipsic, I. (2014). Online speaker de-identification using voice transformation. In 2014 37th International convention on inform ation and communication technology, electronics and microelectronics (mipro) (pp. 1264-1267).
27. Haigh, J. A., & Mason, J. S. (1993). Robust voice activity detection using cepstral features. In 1993 IEEE Region 10 conference on proceedings. computer, communication, control and power engineering (TEN CO N ’93): 3 (pp. 321-324).
28. Ramir. J., Segura, J. E. C., Ben, I. T. C., De La Torre, A., & Rubio, A. (2004). Efficient voice activity detection algorithm susing long-term speech information. Speech Communication, 42, 271-287.
29. Beigi, H. (2011). Speaker Modeling. In Fundam entals of speaker recognition (pp. 525-541).
30. Ganchev, T. (2011). Contemporary methods for speech parameterization. pp. 233-236.
31. Kawakami, Y , Wang, L., Kai, A., & Nakagawa, S. (2014). Speaker identification by com bining various vocal tract and vocal source features. In International conference on text, speech, and dialogue (pp. 382-389).
32. Kawakami, Y , Wang, L., & Nakagawa, S. (2013). Speaker identification using pseudo pitch synchronized phase information in noisy environments. In 2013 Asia-Pacific on signal and inform a tion processing association annual summit and conference (APSIPA) (pp. 1-4).
33. Tanprasert, C., & Achariyakulporn, V. (2000). Comparative study of GMM, DTW, and ANN on Thai speaker identification system. Sixth international conference on spoken language processing, ICSLP 2000 / INTERSPEECH 2000.
34. Luengo, I., Navas, E., Sainz, I. N. A., Saratxaga, I., Sanchez, J., & Odriozola, I. (2008). Text independent speaker identification in multilingual environments. In Proceedings o f the international conference on language resources and evaluation, LREC 2008.
35. Sarma, M., & Sarma, K. K. (2013a). Speaker identification model for Assam ese language using a neural framework. In The 2013 international joint conference on neural networks (IJCNN) (pp. 1-7).
36. Jawarkar, N. P., Holambe, R. S., & Basu, T. K. (2012). Text-independent speaker identification in emotional environments: A classifier fusion approach. In Frontiers in Com puter Education (pp. 569-576).
37. Jawarkar, N. P., Holambe, R. S., & Basu, T. K. (2015). Effect of nonlinear compression function on the performance of the speaker identification system under noisy conditions. In Proceedings of the 2nd International Conference on Perception and Machine Intelligence (pp. 137-144).
38. Nagaraja, B. G., & Jayanna, H. S. (2012). Multilingual speaker identification with the constraint of limited data using m ultitaper MFCC. In International conference on security in com puter networks and distributed systems (pp. 127-134).
39. Wang, L., Zhang, Z., & Kai, A. (2013). Hands-free speaker identification based on spectral subtraction using a multi-channel least mean square approach. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7224-7228
40. Busso, C., Hernanz, S., Chu, C.-W., Kwon, S.-i., Lee, S., & Georgiou, P. G. (2005). Smart room: Participant and speaker localization and identification. IEEE International Conference on A coustics, Speech, and Signal Processing: 2. IEEE. Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85, 1437-1462 6.
41. Sahidullah, M., Chakroborty, S., & Saha, G. (2011). Improving perform ance of speaker identification system using com plementary information fusion. In Proceedings of 17th international conference on advanced com puting and com munications (pp. 182-187).
42. Ahmed, M. Y , Kenkeremath, S., & Stankovic, J. (2015). Socialsense: A collaborative mobile platform for speaker and mood identification. In Wireless sensor networks: 8965 (pp. 68-83).
43. Farhood, Z., & Abdulghafour, M. (2010). Investigation on model selection criteria for speaker identification. In 2010 International symposium in information technology (ITSim): 2 -6 (pp. 537-541).
Review
For citations:
Mamyrbayev O., Kydyrbekova A.S., Akhmediyarova А., Turdalyuly M., Mekebayev N. SYSTEMATIC REVIEW AND ANALYSIS OF THE PECULIARITIES OF IDENTIFICATION BY VOICE. Herald of the Kazakh-British technical university. 2019;16(2):120-133. (In Russ.)