Comparison of IndoBERT and Bi-LSTM Models for Indonesian Law Violation Text Classification

Made Wahyu Adwitya Pramana; Desy Purnami Singgih Putri; I Ketut Adi Purnawan

doi:10.30591/jpit.v10i4.8795

Comparison of IndoBERT and Bi-LSTM Models for Indonesian Law Violation Text Classification

Made Wahyu Adwitya Pramana, Desy Purnami Singgih Putri, I Ketut Adi Purnawan

Abstract

Legal violations in Indonesia, particularly those under the Criminal Code (KUHP) and the Information and Electronic Transactions Law (UU ITE), are often difficult for the general public to interpret due to the complexity of legal language and article structures. This research aims to build a multilabel classification model that can automatically identify relevant legal articles from user-provided case descriptions. Two models were developed and compared: Bidirectional Long Short-Term Memory (Bi-LSTM) and IndoBERT. Using a manually labeled dataset, both models were evaluated through accuracy, F1-score, and Hamming Loss metrics, as well as 5-fold cross-validation. The results showed that IndoBERT outperformed Bi-LSTM with an average accuracy of 97% and a Hamming Loss of 0.027. However, t-test analysis revealed no statistically significant difference in F1-scores, indicating that both models have comparable effectiveness in capturing multiple labels. A confusion matrix analysis further identified patterns of misclassification in semantically similar articles. This study demonstrates the potential of NLP and deep learning to support legal awareness and provide the public with easier access to legal information.

Keywords

Bi-LSTM; IndoBER; KUHP; Text Mining; UU ITE.

Full Text:

References

A. Perdana Hesaputra and D. Hatta Fudholi, “Klasifikasi Pelanggaran Undang-Undang ITE pada Twitter Menggunakan LSTM dan BiLSTM.” [Online]. Available: https://t.co/0dnpcgQiF9

M. Dhafa Maulana, C. Sri, and K. Aditya, “Perbandingan IndoBERT dan Bi-LSTM Dalam Mendeteksi Pelanggaran Undang-Undang ITE,” SINTECH JOURNAL, vol. 8, no. 1, pp. 52–59, 2025, [Online]. Available: https://doi.org/10.31598

A. D. Hasyim and Darsinah, “The Urgency of the Second Amendment to ITE Law from the Standpoint of the Positive Law and Human Rights,” Samarah, vol. 9, no. 1, pp. 45–62, Mar. 2025, doi: 10.22373/sjhk.v9i1.22656.

R. Hayami, “Klasifikasi Teks Berita Berbahasa Indonesia Menggunakan Machine Learning Dan Deep Learning: Studi Literatur,” 2023. [Online]. Available: https://ieeexplore.ieee.org/

J. Amalia, J. Pakpahan, M. Pakpahan, Y. Panjaitan, F. Informatika dan Teknik Elektro, and I. Teknologi Del, “Model Klasifikasi Berita Palsu Menggunakan Bidirectional LSTM Dan Word2Vec Sebagai Vektorisasi,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 9, no. 4, 2022, [Online]. Available: http://jurnal.mdp.ac.id

E. Aurora, A. Zahra, Y. Sibaroni, & Sri, and S. Prasetyowati, “Classification of Multi-Label of Hate Speech on Twitter Indonesia using LSTM and BiLSTM Method,” JINAV: Journal of Information and Visualization, vol. 4, no. 2, pp. 2746–1440, 2023, doi: 10.35877/454RI.jinav1864.

G. Z. Nabiilah, S. Y. Prasetyo, Z. N. Izdihar, and A. S. Girsang, “BERT base model for toxic comment analysis on Indonesian social media,” in Procedia Computer Science, 2022. doi: 10.1016/j.procs.2022.12.188.

F. Farhan, T. Triase, and A. M. Harahap, “Penggunaan Algoritma Naive Bayes Dalam Text Mining Untuk Klasifikasi Pasal UU ITE,” J-SISKO TECH (Jurnal Teknologi Sistem Informasi dan Sistem Komputer TGD), vol. 6, no. 2, 2023, doi: 10.53513/jsk.v6i2.7896.

J. Chen, “BiLSTM-enhanced legal text extraction model using fuzzy logic and metaphor recognition,” PeerJ Comput Sci, vol. 11, 2025, doi: 10.7717/peerj-cs.2697.

A. Rozaq et al., “Legal Literacy in Indonesia: Leveraging Semantic-Based AI and NLP for Enhanced Civil Law Access,” in E3S Web of Conferences, EDP Sciences, Apr. 2025. doi: 10.1051/e3sconf/202562203002.

A. Fabian Azmi, A. Voutama, S. Karawang Jl HSRonggo Waluyo, and T. Timur, “Prediksi Churn Nasabah Bank Menggunakan Klasifikasi Random Forest Dan Decision Tree Dengan Evaluasi Confusion Matrix,” vol. 13, no. 1, 2024.

B. Hakim, “Analisa Sentimen Data Text Preprocessing Pada Data Mining Dengan Menggunakan Machine Learning,” JBASE - Journal of Business and Audit Information Systems, vol. 4, no. 2, Aug. 2021, doi: 10.30813/jbase.v4i2.3000.

N. Pandey, P. K. Patnaik, and S. Gupta, “Data Pre Processing for Machine Learning Models using Python Libraries,” Int J Eng Adv Technol, vol. 9, no. 4, pp. 1995–1999, Apr. 2020, doi: 10.35940/ijeat.D9057.049420.

D. R. Alghifari, M. Edi, and L. Firmansyah, “Implementasi Bidirectional LSTM untuk Analisis Sentimen Terhadap Layanan Grab Indonesia,” Jurnal Manajemen Informatika (JAMIKA), vol. 12, no. 2, pp. 89–99, Sep. 2022, doi: 10.34010/jamika.v12i2.7764.

A. F. Al Farizi and Y. Sibaroni, “Implementation of BiLSTM and IndoBERT for Sentiment Analysis of TikTok Reviews,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 10, no. 1, pp. 96–106, Jan. 2025, doi: 10.29100/jipi.v10i1.5815.

S. Yadav and S. Shukla, “Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification,” in Proceedings - 6th International Advanced Computing Conference, IACC 2016, 2016. doi: 10.1109/IACC.2016.25.

DOI: https://doi.org/10.30591/jpit.v10i4.8795