ОНТОЛОГИЯ ЖӘНЕ ІЗДЕУ МЕХАНИЗМДЕРІ АРҚЫЛЫ ҚАЗАҚ ТІЛІНДЕГІ ЭКСТРАКЦИЯЛЫҚ QA-ДАҒЫ СЕМАНТИКАЛЫҚ ТОЛЫҚТЫҚ

Murat Aitimov; Гаухар Муратова; Zhadyra Bissenbayeva; Ideyat Bapiyev; Kassim Murizah

doi:10.54309/IJICT.2026.25.1.005

SEMANTIC COMPLETENESS IN KAZAKH-LANGUAGE EXTRACTIVE QA THROUGH ONTOLOGY AND RETRIEVAL MECHANISMS

Authors

Murat Aitimov
Г.К. Муратова Кызылординский университет имени Коркыт ата
Zhadyra Bissenbayeva
Ideyat Bapiyev
Kassim Murizah

DOI:

https://doi.org/10.54309/IJICT.2026.25.1.005

Abstract

This study explores extractive question answering for the low-resource Kazakh language by combining ontology-based semantic enrichment with retrieval-augmentation. We design a complete data preparation pipeline, including PDF text extraction, cleaning, chunking, Sentence-BERT vectorization, and FAISS indexing. Using GPT-4, we generate and manually validate a final dataset of 350 QA pairs. Four models are evaluated: mBERT-QA, XLM-RoBERTa-QA, XLM-RoBERTa-QA with ontology injection, and a hybrid Retrieval + XLM-RoBERTa-QA + Ontology system. Evaluation across EM, F1, BERTScore-F1, ROUGE-L, and SemSim metrics shows that hybrid models substantially outperform baselines. The best configuration achieves an F1 score of 52.6%, surpassing mBERT-QA by 21 percentage points. Results demonstrate that ontology-infused context and dense retrieval significantly improve answer span extraction, reducing noise and enhancing semantic alignment. The proposed approach provides an effective foundation for developing high-accuracy educational QA systems in the Kazakh language.

Downloads

Download data is not yet available.

Downloads

PDF (Қазақ)

Published

2026-03-30

How to Cite

Aitimov , M., Муратова, Г., Bissenbayeva , Z., Bapiyev, I., & Murizah , K. (2026). SEMANTIC COMPLETENESS IN KAZAKH-LANGUAGE EXTRACTIVE QA THROUGH ONTOLOGY AND RETRIEVAL MECHANISMS. INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, 7(1), 76–88. https://doi.org/10.54309/IJICT.2026.25.1.005

Download Citation

Issue

Vol. 7 No. 1 (2026): МЕЖДУНАРОДНЫЙ ЖУРНАЛ ИНФОРМАЦИОННЫХ И КОММУНИКАЦИОННЫХ ТЕХНОЛОГИЙ

Section

ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

https://creativecommons.org/licenses/by-nc-nd/3.0/deed.en

INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES

SEMANTIC COMPLETENESS IN KAZAKH-LANGUAGE EXTRACTIVE QA THROUGH ONTOLOGY AND RETRIEVAL MECHANISMS

Authors

DOI:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)