A NAIVE BAYESIAN CLASSIFIER FOR NORMALIZATION OF TEXT: A CASE STUDY FOR KAZAKH LANGUAGE

A NAIVE BAYESIAN CLASSIFIER FOR NORMALIZATION OF TEXT: A CASE STUDY FOR KAZAKH LANGUAGE

Авторы

  • Assylay Tolegenova Suleyman Demirel University

DOI:

https://doi.org/10.54309/IJICT.2023.11.3.002

Ключевые слова:

нормализация текста, алгоритм наивного Байеса, естественный язык, обработка текста, классификатор.

Аннотация

The amount of complicated documents and texts has increased exponentially in recent years, necessitating a deeper understanding of machine learning technologies in order to effectively identify texts in numerous applications. Text normalization is one of the best decision. It is the reduction of all words of the text to the original form.

This paper investigates a layered strategy for fixing mistakes in Kazakh language literature downloaded from the Internet. Because of the widespread use of social media as a source for linguistic study, error correction is a critical issue. The goal of this research was to look at the current Naive Bayes algorithm in English, as well as the normalization of words and sentences in natural languages, in order to create a similar algorithm for the Kazakh language. The purpose of this work was to study the existing Naive Bayes algorithm in English, and the normalization of words and sentences in natural languages, and to develop a similar algorithm for the Kazakh language. Existing algorithms for extracting the stem of a word and possible ways of synthesizing the normal form were considered.

The method of morphology of Kazakh words and their difference from English was considered, suitable for processing words in a dictionary. As a result of the normalization system, the efficiency of this method for the Kazakh language was proved.

Загрузки

Опубликован

2023-02-22

Как цитировать

Tolegenova, A. (2023). A NAIVE BAYESIAN CLASSIFIER FOR NORMALIZATION OF TEXT: A CASE STUDY FOR KAZAKH LANGUAGE. International Journal of Information and Communication Technologies, 3(3), 13–17. https://doi.org/10.54309/IJICT.2023.11.3.002

Выпуск

Раздел

РАЗРАБОТКА ПРОГРАММНОГО ОБЕСПЕЧЕНИЯ И ИНЖЕНЕРИЯ ЗНАНИЙ
Loading...