INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES

A NAIVE BAYESIAN CLASSIFIER FOR NORMALIZATION OF TEXT: A CASE STUDY FOR KAZAKH LANGUAGE

Authors

  • Tolegenova A. Suleyman Demirel University

DOI:

https://doi.org/10.54309/IJICT.2022.11.3.002

Keywords:

naive bayes algorithm, text normalization, , natural language, processing of text, classifier

Abstract

The amount of complicated documents and texts has increased exponentially in recent years, necessitating a deeper understanding of machine learning technologies in order to effectively identify texts in numerous applications. Text normalization is one of the best decision. It is the reduction of all words of the text to the original form.

This paper investigates a layered strategy for fixing mistakes in Kazakh language literature downloaded from the Internet. Because of the widespread use of social media as a source for linguistic study, error correction is a critical issue. The goal of this research was to look at the current Naive Bayes algorithm in English, as well as the normalization of words and sentences in natural languages, in order to create a similar algorithm for the Kazakh language. The purpose of this work was to study the existing Naive Bayes algorithm in English, and the normalization of words and sentences in natural languages, and to develop a similar algorithm for the Kazakh language. Existing algorithms for extracting the stem of a word and possible ways of synthesizing the normal form were considered.

The method of morphology of Kazakh words and their difference from English was considered, suitable for processing words in a dictionary. As a result of the normalization system, the efficiency of this method for the Kazakh language was proved.

Downloads

Download data is not yet available.

Published

2022-09-15

How to Cite

Tolegenova, A. (2022). A NAIVE BAYESIAN CLASSIFIER FOR NORMALIZATION OF TEXT: A CASE STUDY FOR KAZAKH LANGUAGE. INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, 3(3), 17–23. https://doi.org/10.54309/IJICT.2022.11.3.002

Issue

Section

SOFTWARE DEVELOPMENT AND KNOWLEDGE ENGINEERING
Loading...