Study of the transformation of Kazakh language speech into text data

Authors

  • A. Kursabayeva Satbayev University, Kazakhstan

DOI:

https://doi.org/10.51301/ce.2023.i3.06

Keywords:

speech recognition, Kazakh language, VOSK, audio

Abstract

This article investigates the use of VOSK-based voice recognition model for the Kazakh language. In particular, it provides a comparative analysis between two variants of VOSK speech recognition models: VOSK big and VOSK small. The assessment is carried out within the framework of the Kazakh language using the KazakhTTS dataset, prepared in 2021 by the ISSAI team. The results of the experiment, presented in the form of a Word Error Rate (WER), showed that VOSK big shows a better result (51%) compared to VOSK small (55%). However, it was pointed out that there are limitations in the recognition of word endings and that some errors occur in speech recognition. A discussion of the results highlights the potential of the model and points to the need for further refinement and training on more diverse data. The key conclusions are outlined in the conclusion, along with potential directions for further study in the area of Kazakh speech recognition.

Downloads

Published

2023-09-30

How to Cite

Kursabayeva, A. . (2023). Study of the transformation of Kazakh language speech into text data. Computing &Amp; Engineering, 1(3), 29–35. https://doi.org/10.51301/ce.2023.i3.06

Issue

Section

Innovative Computing Systems and Engineering Solutions