A NAIVE BAYESIAN CLASSIFIER FOR NORMALIZATION OF TEXT: A CASE STUDY FOR KAZAKH LANGUAGE
DOI:
https://doi.org/10.54309/IJICT.2022.11.3.002Keywords:
naive bayes algorithm, text normalization, , natural language, processing of text, classifierAbstract
The amount of complicated documents and texts has increased exponentially in recent years, necessitating a deeper understanding of machine learning technologies in order to effectively identify texts in numerous applications. Text normalization is one of the best decision. It is the reduction of all words of the text to the original form.
This paper investigates a layered strategy for fixing mistakes in Kazakh language literature downloaded from the Internet. Because of the widespread use of social media as a source for linguistic study, error correction is a critical issue. The goal of this research was to look at the current Naive Bayes algorithm in English, as well as the normalization of words and sentences in natural languages, in order to create a similar algorithm for the Kazakh language. The purpose of this work was to study the existing Naive Bayes algorithm in English, and the normalization of words and sentences in natural languages, and to develop a similar algorithm for the Kazakh language. Existing algorithms for extracting the stem of a word and possible ways of synthesizing the normal form were considered.
The method of morphology of Kazakh words and their difference from English was considered, suitable for processing words in a dictionary. As a result of the normalization system, the efficiency of this method for the Kazakh language was proved.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 International Journal of Information and Communication Technologies
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://creativecommons.org/licenses/by-nc-nd/3.0/deed.en