INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES

SEMANTIC COMPLETENESS IN KAZAKH-LANGUAGE EXTRACTIVE QA THROUGH ONTOLOGY AND RETRIEVAL MECHANISMS

Authors

  • Murat Aitimov
  • Г.К. Муратова Кызылординский университет имени Коркыт ата
  • Zhadyra Bissenbayeva
  • Ideyat Bapiyev
  • Kassim Murizah

DOI:

https://doi.org/10.54309/IJICT.2026.25.1.005

Abstract

This study explores extractive question answering for the low-resource Kazakh language by combining ontology-based semantic enrichment with retrieval-augmentation. We design a complete data preparation pipeline, including PDF text extraction, cleaning, chunking, Sentence-BERT vectorization, and FAISS indexing. Using GPT-4, we generate and manually validate a final dataset of 350 QA pairs. Four models are evaluated: mBERT-QA, XLM-RoBERTa-QA, XLM-RoBERTa-QA with ontology injection, and a hybrid Retrieval + XLM-RoBERTa-QA + Ontology system. Evaluation across EM, F1, BERTScore-F1, ROUGE-L, and SemSim metrics shows that hybrid models substantially outperform baselines. The best configuration achieves an F1 score of 52.6%, surpassing mBERT-QA by 21 percentage points. Results demonstrate that ontology-infused context and dense retrieval significantly improve answer span extraction, reducing noise and enhancing semantic alignment. The proposed approach provides an effective foundation for developing high-accuracy educational QA systems in the Kazakh language.

Downloads

Download data is not yet available.

Published

2026-03-30

How to Cite

Aitimov , M., Муратова, Г., Bissenbayeva , Z., Bapiyev, I., & Murizah , K. (2026). SEMANTIC COMPLETENESS IN KAZAKH-LANGUAGE EXTRACTIVE QA THROUGH ONTOLOGY AND RETRIEVAL MECHANISMS. INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, 7(1), 76–88. https://doi.org/10.54309/IJICT.2026.25.1.005
Loading...