A comparative analysis of machine learning methods for personal information recognition (PII) in unstructured texts

Authors

  • A. Makhambet Satbayev University, Kazakhstan
  • A. Moldagulova Satbayev University, Kazakhstan

DOI:

https://doi.org/10.51301/ce.2025.i1.07

Keywords:

PII detection, machine learning, unstructured text, data privacy, neural networks, transformers (BERT), named entity recog-nition (NER), information security

Abstract

With the rapid growth of unstructured data and increased attention to the privacy of personally identifiable information (PII), the tasks of automatic recognition and data protection are becoming increasingly relevant. This paper provides a comparative analysis of machine learning methods for recognizing PII in unstructured texts. The study considers rule-based methods, classification algorithms (SVM, random forests), and deep learning models (neural networks, transformers). The effectiveness of the models is assessed using metrics such as accuracy, recall, and F1-measures. The experimental results show that deep learning models such as BERT demonstrate high accuracy and recall, outperforming traditional methods. However, they require significant computing resources and a large amount of training data. The article discusses the advantages and disadvantages of each approach, and offers recommendations for choosing a model depending on the specifics of the task and available resources. Beyond technical advances, the study highlights the value creation provided by effective PII recognition, including improved data security, automated compliance, and operational efficiency.

Downloads

Published

2025-03-31

How to Cite

Makhambet, A. ., & Moldagulova, A. . (2025). A comparative analysis of machine learning methods for personal information recognition (PII) in unstructured texts. Computing &Amp; Engineering, 3(1), 41–52. https://doi.org/10.51301/ce.2025.i1.07

Issue

Section

Innovative Computing Systems and Engineering Solutions