UNIVERSITY’S SCIENTIFIC RESOURCES PROCESSING IN KNOWLEDGE MANAGEMENT SYSTEMS
Abstract
This paper describes the ontologically-oriented approach of text processing of information resources of the university associated with scientific activities. An ontology is used as an information model of scientific knowledge. The information model of the scientific resources of the university, methods for the classification of texts in scientific fields and thematic annotation of texts are described. The ontologically-oriented approach allows you to organize, structure information resources of the university associated with scientific activities, and develop methods for finding knowledge. The general model of knowledge of the university is described with the help of ontology. OWL DL (Web Ontology Language) is used as the ontology description language. When developing an ontology for describing various characteristics of classes and properties, OWL axioms of classes and relations were compiled, and attributes were established. For the classification of scientific resources used kNN-classification. The task of classification in machine learning is the assignment of an object to one of the predefined classes on the basis of its formalized features. The kNN method (k nearest neighbors method) is a vector classification model. The kNN classifier refers the document to the prevailing class of nearest neighbors. The k parameter in the kNN method is preset based on knowledge of the classification task being solved. In this paper, the kNN method is used for the multivalued classification problem. Classification for classes that are not mutually exclusive, are called multi-valued (English Multilabel Classification) classification. Document classification consists of the following actions: linguistic analysis, extraction of terms and formation of the document vector space, calculation of k nearest neighbors, ranking of classes. For subject annotation of texts, domain ontology classes are used. In the ontological dictionary, terms are grouped by domain class.
About the Authors
G. ZhomartkyzyKazakhstan
S. K. Kumargazhanova
Kazakhstan
G. V. Popova
Kazakhstan
References
1. Ceci F., Pietrobon R., Gon9alves, A. L. (2012) Turning Text into Research Networks: Information Retrieval and Computational Ontologies in the Creation of Scientific Databases. PLoS ONE, Vol. 7 (1), P. 1-9.
2. Kryukov K. V., Kuznetsov O. P., Suhoverov V. S. (2013). On the notion of a formal competency researchers. In Proceedings of III International Scientific and Technical Conference- OSTIS-2013, Russia, Moscow, P. 143-146.
3. Ma J., Xu W., Sun Y., Turban E., Wang Sh., and Liu O. (2012). An Ontology-Based Text-Mining Method to Cluster Proposals for Research Project Selection. IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems and Humans, Vol. 42( 3), P. 784-790.
4. Thiagarajan R., Manjunath G., Stumptner M. (2008). Finding Experts By Semantic Matching of User Profiles. HP Laboratories, URL: http://www.hpl.hp.com/techreports/2008/HPL-2008-172.pdf
5. Lukashevich N. V (2011). Thesauri in information retrieval tasks. Moscow University Publishing House, ISBN 978-5-211-05926-0, Russia, P. 415.
6. Manning Ch. D., Raghavan P., Schutze H. (2009). Introduction to Information Retrieval.1
7. Bolshakov E. I., Klyshinsky E. S., Lande D. V., Noskov, A. A, Peskov O. V, Yagunova E. V. (2011). Automatic processing of natural language text and computational linguistics. MIEM Publishing House, ISBN 978-5-94506-294-8, Russia, P. 272.
8. Du M., Chen X. (2013). Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization. In Journal of Zhejiang University-Science C-Computers & Electronics, Vol. 14 (6), P. 407-416.
9. Shengyi Jiang, Guansong Pang, Meiling Wu, Limin Kuang. (2012). An improved K-nearest-neighbor algorithm for text categorization. In Proceedings of the Expert Systems with Applications 39, pp: 1503-1509.
10. Jiang J., Tsai Sh., Lee Sh. (2012). FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors. In Proceedings of the Expert Systems with Applications 39, P. 2813-2821.
11. Allemang D., Hendler J.(2011). Semantic Web for the Working Ontologist. ISBN-13: 978-0-12-373556-0, USA.
12. Guarino N. (2009). The Ontological Level: Revisiting 30 Years of Knowledge Representation. Conceptual Modeling: Foundations and Applications, P. 52-67.
13. Malarvizhi P., Ramachandra V. P. (2013). Multilabel classification of documents with MAPREDUCE. In International Journal of Engineering and Technology (IJET), ISSN : 0975-4024, P. 1260-1267.
14. Cherman E. A., Monard M. C., Metz J. (2011). Multi-label Problem Transformation Methods: a Case Study. In Electronic Journal CLEI, Vol. 14 (1), P. 4-13.
15. Liu Y., Loh Han T., Sun A. (2009). Imbalanced text classification: A term weighting approach. In Proceedings of the Expert Systems with Applications, Vol. 36, P. 690-701.
Review
For citations:
Zhomartkyzy G., Kumargazhanova S.K., Popova G.V. UNIVERSITY’S SCIENTIFIC RESOURCES PROCESSING IN KNOWLEDGE MANAGEMENT SYSTEMS. Herald of the Kazakh-British technical university. 2019;16(3):122-128. (In Russ.)