INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES

INTELLIGENT CLUSTERING METHODS FOR PROCESSING AND ANALYZING SHORT TEXTS

Authors

  • Aigulim Baegizova
  • Гүлден Мырзабекова Казахский агротехнический исследовательский университет имени С. Сейфуллина
  • Ainagul Alimagambetova
  • Galiya Mukhamedrakhimova
  • М. Кассим MARA технологиялар университеті, Малайзия

DOI:

https://doi.org/10.54309/IJICT.2025.22.2.002

Abstract

This study presents an in-depth exploration of short text clustering, employing advanced methodologies such as Bidirectional Encoder Representations from Transformers (BERT), Term Frequency-Inverse Document Frequency (TF-IDF), and a novel hybrid technique combining Latent Dirichlet Allocation, BERT, and Autoencoder (LDA+BERT+AE). The research begins with a discussion of the theoretical foundations of each method, highlighting their advantages and limitations. BERT is evaluated for its ability to capture word dependencies within text, whereas TF-IDF is recognized for its efficiency in determining term significance. In the experimental section, the effectiveness of these methods in clustering short texts is systematically compared, with particular emphasis on the hybrid LDA+BERT+AE approach. A comprehensive analysis of the LDA-BERT model’s training and validation loss across 200 epochs reveals that initial loss values exceed 1.2, rapidly declining to approximately 0.8 within the first 25 epochs, before eventually stabilizing around 0.4. The close correlation between the training and validation curves indicates the model's ability to learn effectively and generalize well, demonstrating minimal overfitting. Findings from the study illustrate that the LDA+BERT+AE hybrid method significantly improves text clustering performance compared to standalone approaches. Based on these results, recommendations are provided for the optimal selection and combination of clustering techniques tailored to various short text types and natural language processing (NLP) tasks. Additionally, the study explores the practical applications of these methods in industrial and academic environments, where precise text processing and categorization are essential. The research concludes by underscoring the importance of an integrated approach to short text analysis, which facilitates deeper semantic comprehension and more effective information extraction.

Downloads

Download data is not yet available.

Published

2025-06-15

How to Cite

Baegizova , A., Мырзабекова, Г., Alimagambetova, A., Mukhamedrakhimova, G., & М. Кассим. (2025). INTELLIGENT CLUSTERING METHODS FOR PROCESSING AND ANALYZING SHORT TEXTS. INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, 6(2), 23–36. https://doi.org/10.54309/IJICT.2025.22.2.002
Loading...