A comparative analysis of machine learning methods for personal information recognition (PII) in unstructured texts

A.  Makhambet; A.  Moldagulova

doi:10.51301/ce.2025.i1.07

A comparative analysis of machine learning methods for personal information recognition (PII) in unstructured texts

Authors

A. Makhambet Satbayev University, Kazakhstan
A. Moldagulova Satbayev University, Kazakhstan

DOI:

https://doi.org/10.51301/ce.2025.i1.07

Keywords:

PII detection, machine learning, unstructured text, data privacy, neural networks, transformers (BERT), named entity recog-nition (NER), information security

Abstract

With the rapid growth of unstructured data and increased attention to the privacy of personally identifiable information (PII), the tasks of automatic recognition and data protection are becoming increasingly relevant. This paper provides a comparative analysis of machine learning methods for recognizing PII in unstructured texts. The study considers rule-based methods, classification algorithms (SVM, random forests), and deep learning models (neural networks, transformers). The effectiveness of the models is assessed using metrics such as accuracy, recall, and F1-measures. The experimental results show that deep learning models such as BERT demonstrate high accuracy and recall, outperforming traditional methods. However, they require significant computing resources and a large amount of training data. The article discusses the advantages and disadvantages of each approach, and offers recommendations for choosing a model depending on the specifics of the task and available resources. Beyond technical advances, the study highlights the value creation provided by effective PII recognition, including improved data security, automated compliance, and operational efficiency.

Downloads

Published

2025-03-31

How to Cite

Makhambet, A. ., & Moldagulova, A. . (2025). A comparative analysis of machine learning methods for personal information recognition (PII) in unstructured texts. Computing &Amp; Engineering, 3(1), 41–52. https://doi.org/10.51301/ce.2025.i1.07

Download Citation

Issue

Vol. 3 No. 1 (2025): Computing & Engineering

Section

Innovative Computing Systems and Engineering Solutions

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

<div class="pkpfooter-son">
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/80x15.png"></a><br>This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.
</div>

Computing & Engineering

A comparative analysis of machine learning methods for personal information recognition (PII) in unstructured texts

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Language

Information

Make a Submission

Supported by