Preview

Herald of the Kazakh-British Technical University

Advanced search

A HYBRID APPROACH TO THE ANALYSIS OF CITATION TONALITY BASED ON LINGUISTIC FEATURES AND MACHINE LEARNING

https://doi.org/10.55452/1998-6688-2026-23-1-52-67

Abstract

The analysis of tonality in scientific texts, including citations, is actively advancing, enabling the identification of emotional coloring in references and their impact on scientific discourse. This study focuses on developing and evaluating a hybrid approach that integrates linguistic rules (analysis of parts of speech, syntactic dependencies, and negations) with machine learning algorithms (SVM, RF, NB, J48) to classify citation tonality. Experiments were conducted on the ACL Anthology (8700 sentences) and Clinical Trials (6500 additional sentences) corpora using stratified splitting (70/15/15 for train/val/test) and 5-fold cross-validation. The proposed method achieved 90% macro-F1 and 95% F1-score on the Athar dataset, and 85% macro-F1 on Clinical Trials, showing a 10–15% improvement over baseline models (BERT, LSTM). Ablation studies confirmed the contribution of linguistic rules (F1 increase of 5–7% when excluded). Statistical significance tests (McNemar, p<0.05) validated the robustness of the results. The approach proves effective for automated citation analysis and scientific impact assessment.

About the Authors

A. T. Akhmediyarova
Satbayev University, Kazakh National Research Technical University named after K.I. Satpayev
Kazakhstan

PhD, Professor

Almaty



Zh. M. Alibiyeva
1Satbayev University, Kazakh National Research Technical University named after K.I. Satpayev
Kazakhstan

PhD, Associate Professor

Almaty



D. O. Oralbekova
Institute of Information and Computational Technologies
Kazakhstan

PhD, Junior Researcher

Almaty



A. I. Nauryzbayeva
Satbayev University, Kazakh National Research Technical University named after K.I. Satpayev
Kazakhstan

Senior Lecturer

Almaty



D. T. Kassymova
ALT University named after M. Tynyshpayev
Kazakhstan

PhD, Head of the Department, Assistant Professor

Almaty



References

1. Ihsan, I., Qadir, M.A. CCRO: Citation’s context and reasons ontology. IEEE Access., 7, 30423–30436 (2019). https://doi.org/10.1109/ACCESS.2019.2903450.

2. Jha, R., Jbara, A.-A., Qazvinian, V., Radev, D.R. NLP-driven citation analysis for scientometrics. Natural Language Engineering, 23(1), 93–130 (2017). https://doi.org/10.1017/S1351324915000443.

3. Radev, D.R. and al. The ACL Anthology Network corpus. Language Resources and Evaluation, 47(4), 919–944 (2013). URL: https://aclanthology.org/W09-3607.

4. Garzone, M., Mercer, R.E. Towards an automated citation classifier. Canadian AI Conference: materials (Berlin: Springer, 2000), pp. 337–346. https://doi.org/10.1007/3-540-45486-1_28.

5. Athar, A., Teufel, S. Context-enhanced citation sentiment detection. Proceedings of the 2012 Conference of the North American Chapter of the ACL: Human Language Technologies: materials (Montréal: Association for Computational Linguistics, 2012), pp. 597–601. URL: https://aclanthology.org/N12-1073.

6. Parthasarathy, G., Tomar, D. Sentiment analyzer: Analysis of journal citations from citation databases. IEEE Confluence: materials (Noida: IEEE, 2014), pp. 923–928. https://doi.org/10.1109/CONFLUENCE.2014.6949321.

7. Hernández-Álvarez, M., Gómez, J.M. Citation impact categorization for scientific literature. IEEE International Conference on Computational Science and Engineering: materials (Porto: IEEE, 2015), pp. 307–313. https://doi.org/10.1109/CSE.2015.21.

8. Ikram, M.T., Afzal, M.T. Aspect based citation sentiment analysis using linguistic patterns. Scientometrics, 119 (1), 73–95 (2019). https://doi.org/10.1007/s11192-019-03044-0.

9. Yousif, A. and al. Multi-task learning model for citation sentiment classification. Neurocomputing, 335, 195–205 (2019). https://doi.org/10.1016/j.neucom.2018.10.050.

10. Mercier, D. and al. ImpactCite: XLNet-based citation impact analysis. International Conference on Agents and Artificial Intelligence: materials (Vienna: SciTePress, 2021), pp. 159–168. https://doi.org/10.5220/0010235201590168.

11. Jiang, M., Lin, B.Y., Wang, S. and al. Knowledge-augmented Methods for Natural Language Processing. Singapore: Springer, 2024. https://doi.org/10.1007/978-981-97-0747-8.

12. Yang, Y., Zhou, J., Ding, X. and al. Recent Advances of Foundation Language Models-based Continual Learning: A Survey. arXiv preprint arXiv:2405.18653 (2024). https://doi.org/10.48550/arXiv.2405.18653.

13. Hu, Y., Lu, Y. Retrieval-augmented language models: A survey. arXiv preprint arXiv:2404.19543 (2024). https://doi.org/10.48550/arXiv.2404.19543.

14. Jovanovic, M., Voss, P. Trends and challenges of real-time learning in large language models: A critical review. arXiv preprint arXiv:2404.18311 (2024). https://doi.org/48550/arXiv.2404.18311.

15. Loureiro, M.V., Derby, S., Wijaya, T.K. Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks. Proceedings of LREC-COLING 2024: materials (Torino: ELRA and ICCL, 2024), pp. 16315–16330. URL: https://aclanthology.org/2024.lrec-main.1418.

16. Royesh, A., Oladeji, O. Information Extraction: An application to the domain of hyper-local financial data. arXiv preprint arXiv:2403.09077 (2024). https://doi.org/10.48550/arXiv.2403.09077.

17. Helaria, B., Kumar, A. Hybrid Lexicon and Transformer-Based Sentiment Analysis. International Conference on Advanced Computing and Communication Systems: materials (Singapore: Springer, 2024). https://doi.org/ 10.1007/978-981-10-4555-4_11.

18. Anuj Kumar and al. Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution. IEEE Access., 10, 21087–21100 (2022). https://doi.org/10.1109/ACCESS.2022.3149482.

19. Sula, C.A., Miller, M. Citations, contexts, and humanistic discourse. Literary and Linguistic Computing, 29, 452–464 (2014). https://doi.org/ 10.1093/llc/fqu019.

20. Dong, C., Schäfer, U. Ensemble-style self-training on citation classification. International Joint Conference on Natural Language Processing: materials (Chiang Mai: Asian Federation of Natural Language Processing, 2011), pp. 623–631. URL: https://aclanthology.org/I11-1070.


Review

For citations:


Akhmediyarova A.T., Alibiyeva Zh.M., Oralbekova D.O., Nauryzbayeva A.I., Kassymova D.T. A HYBRID APPROACH TO THE ANALYSIS OF CITATION TONALITY BASED ON LINGUISTIC FEATURES AND MACHINE LEARNING. Herald of the Kazakh-British Technical University. 2026;23(1):52-67. (In Russ.) https://doi.org/10.55452/1998-6688-2026-23-1-52-67

Views: 12

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1998-6688 (Print)
ISSN 2959-8109 (Online)