ENSEMBLE MACHINE LEARNING APPROACHES FOR IT PROJECT COST ESTIMATION UNDER DATA SCARCITY CONDITIONS
https://doi.org/10.55452/1998-6688-2026-23-1-107-116
Abstract
Accurate prediction of IT project costs is crucial for successful project planning, budgeting, and resource allocation. However, typical cost estimation methods, such as Function Point Analysis, or expert-based evaluations, frequently fail to produce trustworthy conclusions, especially in developing countries like Kazakhstan where previous project data is few or incomplete. This study looks into how ensemble machine learning algorithms, notably Random Forest and Gradient Boosting, can be used to predict IT project costs when there is insufficient data available. To solve data shortage, this study applies synthetic data creation techniques, which result in extended datasets that model various project scenarios while retaining statistical features observed in real-world cases. The presented models use essential project variables, such as team size, project complexity, development process, and project size, as inputs for cost prediction. Experimental results show that ensemble approaches outperform standard estimating techniques in terms of predictive accuracy. Random Forest achieved the lowest mean absolute error (MAE = 0.09) and highest coefficient of determination (R² = 0.603). Furthermore, feature importance analysis shows that project size and development time are the most important elements in cost estimation. The findings demonstrate ensemble learning’s usefulness in dealing with complicated, nonlinear connections among project variables, as well as providing a feasible approach for improving cost estimation techniques in the absence of high-quality historical data. This work adds to the development of intelligent decision support systems and offers practical insights for IT project managers and policymakers in emerging economies who want to improve IT project budgeting and planning.
About the Authors
A. K. AitimKazakhstan
MSc.
Almaty
G. K. Sembina
Kazakhstan
Cand. Tech. Sc.
Almaty
References
1. Bach, M.P., Topalovic, A., Krstic, Z., Ivec, A. Predictive maintenance in industry 4.0 for the SMEs: A decision support system case study using open-source software. Designs, 7, 98 (2023). https://doi.org/10.3390/designs7040098
2. Suleiman, Z., Shaikholla, S., Dikhanbayeva, D., Shehab, E., Türkyılmaz, A. Industry 4.0: Clustering of concepts and characteristics. Cogent Engineering, 2034264 (2022). https://doi.org/10.1080/23311916.2022.2034264
3. Çakır, M., Güvenç, M.A., Mıstıkoğlu, S. The experimental application of popular machine learning algorithms on predictive maintenance and the design of IoT-based condition monitoring system. Computers & Industrial Engineering, 151, 106948 (2021). https://doi.org/10.1016/j.cie.2020.106948
4. Sembina, G., Aitim, A., Shaizat, M. Machine learning algorithms for predicting and preventive diagnosis of cardiovascular disease. In: 2022 International Conference on Smart Information Systems and Technologies (SIST), 1–5 (2022). https://doi.org/10.1109/sist54437.2022.9945708
5. Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 5 (2021). https://doi.org/10.1007/s42979-021-00592-x
6. Fernandes, M., Corchado, J.M., Marreiros, G. Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: A systematic literature review. Applied Intelligence, 52, 14246–14280 (2022). https://doi.org/10.1007/s10489-022-03344-3
7. Frankó, A., Hollósi, G., Ficzere, D., Varga, P. Applied machine learning for IoT and smart production – Methods to improve production quality, safety and sustainability. Sensors, 22, 9148 (2022). https://doi.org/10.3390/s22239148
8. Kane, A.P., Kore, A.S., Khandale, A.N., Nigade, S.S., Joshi, P.P. Predictive maintenance using machine learning. arXiv, 2205.09402 (2022). https://doi.org/10.48550/arxiv.2205.09402
9. Arboretti, R., Ceccato, R., Pegoraro, L., Salmaso, L. Design of experiments and machine learning for product innovation: A systematic literature review. Quality and Reliability Engineering International, 38, 1131–1156 (2021). https://doi.org/10.1002/qre.3025
10. Aitim, A., Sembina, G. Modeling of human behavior for smartphone using machine learning algorithm. News of the National Academy of Sciences of the Republic of Kazakhstan. Physico-Mathematical Series, 4, 17–28 (2024). https://doi.org/10.32014/2024.2518-1726.304
11. ёSarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2, 5 (2021). https://doi.org/10.1007/s42979-021-00592-x
12. Fernandes, M., Corchado, J.M., Marreiros, G. Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in real industrial manufacturing use-cases: A systematic literature review. Applied Intelligence, 52, 14246–14280 (2022). https://doi.org/10.1007/s10489-022-03344-3
13. Srivastava, P., Srivastava, N., Agarwal, R., Singh, P. NEMAEP: A novel ensemble machine learning framework for accurate effort estimation in software projects. Journal of Advanced Research in Technology and Engineering, 102(24), 9112–9125 (2024).
14. Aitim, A. Developing methods for automatic processing systems of Kazakh language. KazATC Bulletin, 133(4), 254–265 (2024). https://doi.org/10.52167/1609-1817-2024-133-4-254-265
15. Mansoor, F., Alim, M.A., Jilani, M.T., Alam, M.M., Su’ud, M.M. Enhancing software cost estimation using feature selection and machine learning techniques. Computers, Materials & Continua, 79(3), 12345–12367 (2024). https://doi.org/10.32604/cmc.2024.057979
16. Akumba, B.O., Ogala, E., Agaji, I., Akumba, B.T., Blamah, N.V., Otor, S.U. Improving software effort estimation accuracy with a Kalman filter-driven ensemble model. International Journal of Computer Applications, 186(58), 45–59 (2024).
17. Seilo, J. Artificial intelligence in software project cost estimation. Bachelor’s thesis, Lappeenranta–Lahti University of Technology LUT, 33 p. (2025).
18. Alhazmi, O.H., Khan, M.Z. Software effort prediction using ensemble learning methods. Journal of Software Engineering and Applications, 13(7), 143–158 (2020). https://doi.org/10.4236/jsea.2020.137010
19. Aitim, A. Building a high-quality annotated corpus for Kazakh NLP: A pipeline approach. Vestnik KazUTB, 4(29) (2025). https://doi.org/10.58805/kazutb.v.4.29-1092.
20. Ahmed, B.M. Predicting software effort estimation using machine learning techniques. In: 2018 8th International Conference on Computer Science and Information Technology, 249–256 (2018). https://doi.org/10.1109/CSIT.2018.8486222
21. Zubair, K.M. Particle swarm optimisation based feature selection for software effort prediction using supervised machine learning and ensemble methods: A comparative study. Invertis Journal of Science & Technology, 13, 33–50 (2020).
Review
For citations:
Aitim A.K., Sembina G.K. ENSEMBLE MACHINE LEARNING APPROACHES FOR IT PROJECT COST ESTIMATION UNDER DATA SCARCITY CONDITIONS. Herald of the Kazakh-British Technical University. 2026;23(1):107-116. https://doi.org/10.55452/1998-6688-2026-23-1-107-116
JATS XML






