Study of the transformation of Kazakh language speech into text data
DOI:
https://doi.org/10.51301/ce.2023.i3.06Keywords:
speech recognition, Kazakh language, VOSK, audioAbstract
This article investigates the use of VOSK-based voice recognition model for the Kazakh language. In particular, it provides a comparative analysis between two variants of VOSK speech recognition models: VOSK big and VOSK small. The assessment is carried out within the framework of the Kazakh language using the KazakhTTS dataset, prepared in 2021 by the ISSAI team. The results of the experiment, presented in the form of a Word Error Rate (WER), showed that VOSK big shows a better result (51%) compared to VOSK small (55%). However, it was pointed out that there are limitations in the recognition of word endings and that some errors occur in speech recognition. A discussion of the results highlights the potential of the model and points to the need for further refinement and training on more diverse data. The key conclusions are outlined in the conclusion, along with potential directions for further study in the area of Kazakh speech recognition.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Computing & Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
<div class="pkpfooter-son">
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/80x15.png"></a><br>This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.
</div>