CREATION OF AUTOMATIC DOCUMENT ANALYSIS MODEL USING DIFFERENT MACHINE LEARNING ALGORITHMS
DOI:
https://doi.org/10.54309/IJICT.2025.21.1.011Abstract
This article presents a model for automatic document analysis. It is based on the TF-IDF (Term Frequency-Inverse Document Frequency) method combined with various machine learning algorithms, including SVM (Support Vector Machine), Random Forest, and Word2Vec+SVM. The aim of the study is to compare the effectiveness of these methods in text classification tasks and identify the most efficient approach. Experimental results showed that the hybrid model using a combination of TF-IDF and Word2Vec with SVM achieved the highest accuracy (90.2%) and F1-score (82.52%). The TF-IDF method allows evaluating the importance of terms in the text, while Word2Vec converts words into vector representations, improving semantic matching. The SVM algorithm effectively divides data into classes using hyperplanes, while Random Forest enhances classification quality through the use of decision tree ensembles. Additionally, the study highlighted the importance of text preprocessing (tokenization, normalization, stopword removal, and lemmatization), which significantly improves classification performance. The proposed model can be effectively applied in areas such as information retrieval, topic modeling, and automatic document summarization. Such hybrid approaches improve the accuracy and reliability of automatic text analysis, offering opportunities for adaptation to multilingual environments and the integration of new data sources. Experimental results confirmed the effectiveness of this approach for solving complex tasks such as sentiment analysis, document categorization, and topic modeling. This study is a significant step toward developing new solutions in the field of automatic text analysis.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://creativecommons.org/licenses/by-nc-nd/3.0/deed.en