Optimasi Bobot Kelas LSTM untuk Deteksi URL Phishing pada Dataset Tidak Berimbang

Tri Ferli Handoyo; Muhammad Pajar Kharisma Putra

doi:10.30591/jpit.v10i1.8128

Optimasi Bobot Kelas LSTM untuk Deteksi URL Phishing pada Dataset Tidak Berimbang

Tri Ferli Handoyo, Muhammad Pajar Kharisma Putra

Abstract

Phishing URL detection is one of the main challenges in cybersecurity, considering the ever-increasing threats affecting internet users globally. This research aims to develop a Long Short-Term Memory (LSTM) based deep learning model to detect phishing URLs with high accuracy. The dataset used consists of 651,191 URLs, which are divided into four categories: benign, defacement, phishing, and malware. The dataset is processed through preprocessing stages, including URL cleaning and feature extraction. The LSTM model is applied with optimized hyperparameter configurations to learn patterns from the dataset. The results showed that the model was able to achieve significant accuracy during the training and validation process. Evaluation on external datasets shows that the model performs well in the benign and defacement categories, with relatively high precision and recall. However, challenges were identified in the malware and phishing categories, where recall was low due to dataset imbalance and lack of feature representation. Further analysis showed a model bias towards the majority class, as well as difficulty in detecting URLs in the minority class. This research shows the potential of using LSTM-based deep learning in phishing URL detection, but also emphasizes the importance of further optimization, such as adjusting class weights, oversampling, or using additional features. It is hoped that the resulting model can be an initial solution in improving cyber security, especially in detecting phishing threats in real-time.

Keywords

keamanan siber,deep learning, LSTM, deteksi ancaman, URL phishing.

Full Text:

References

V. A. Windarni, A. F. Nugraha, S. T. A. Ramadhani, D. A. Istiqomah, F. M. Puri, and A. Setiawan, “Deteksi Website Phishing Menggunakan Teknik Filter Pada Model Machine Learning,” Inf. Syst. J., vol. 6, no. 01, pp. 39–43, 2023, doi: 10.24076/infosjournal.2023v6i01.1268.

L. Tang and Q. H. Mahmoud, “A Deep Learning-Based Framework for Phishing Website Detection,” IEEE Access, vol. 10, pp. 1509–1521, 2022, doi: 10.1109/ACCESS.2021.3137636.

M. A. B. Dewanto, M. Fathurrahman, D. R. Firdaus, and A. Setiawan, “Penipuan Penambah Followers Instagram: Analisis Serangan Phising dan Dampaknya pada Keamanan Data,” J. Internet Softw. Eng., vol. 1, no. 4, p. 11, 2024, doi: 10.47134/pjise.v1i4.2672.

Bjcoid2, “Serangan Phishing di Indonesia Terus Meningkat,” bankjombang.co.id. Accessed: Dec. 17, 2024. [Online]. Available: https://bankjombang.co.id/serangan-phishing-di-indonesia-terus-meningkat-berikut-data-lengkapnya/

A. Abuadbba et al., “Towards Web Phishing Detection Limitations and Mitigation,” 2022, [Online]. Available: http://arxiv.org/abs/2204.00985

A. S. Sitio and F. A. Sianturi, “Penerapan Algoritma Machine Learning dalam Analisis Pola Perilaku Penggunaan Internet,” DIKE J. Ilmu Multidisiplin, vol. 2, no. 2, pp. 46–51, 2024, doi: 10.69688/dike.v2i2.102.

A. Fathurohman, “Machine Learning Untuk Pendidikan: Mengapa Dan Bagaimana,” J. Inform. dan Teknol. Komput., vol. 1, no. 3, pp. 57–62, 2021, [Online]. Available: https://journal.amikveteran.ac.id/index.php/jitek/article/view/306

H. Gurung, R. Nepal, and S. Nepal, “Phishing URL Detection Using CNN-LSTM and Random Forest Classifier,” Int J Med Net, vol. 2, no. 5, pp. 1–6, 2023, [Online]. Available: https://www.

S. Aslam, H. Aslam, A. Manzoor, H. Chen, and A. Rasool, “AntiPhishStack: LSTM-Based Stacked Generalization Model for Optimized Phishing URL Detection,” Symmetry (Basel)., vol. 16, no. 2, 2024, doi: 10.3390/sym16020248.

B. Banik and A. Sarma, “Phishing Url Detection Using Lstm Based Ensemble Learning Approaches,” Int. J. Comput. Networks Commun., vol. 15, no. 1, pp. 17–33, 2023, doi: 10.5121/ijcnc.2023.15102.

S. Shabudin, N. S. Sani, K. A. Z. Ariffin, and M. Aliff, “Feature selection for phishing website classification,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 4, pp. 587–595, 2020, doi: 10.14569/IJACSA.2020.0110477.

R. Indu, M. Bhavya, V. Pardhasaradhi, Y. S. Ram, and Y. Suresh, “Malicious url detection 1,” vol. 11, no. 4, pp. 612–618, 2023, [Online]. Available: https://ijcrt.org/papers/IJCRT2304563.pdf

N. P. S. Wati and C. Pramartha, “Penerapan Long Short Term Memory dalam Mengklasifikasi Jenis Ujaran Kebencian pada Tweet Bahasa Indonesia,” J. Nas. Teknol. Inf. dan Apl., vol. 1, no. 1, pp. 755–762, 2022.

A. Hasiholan, I. Cholissodin, and N. Yudistira, “Analisis Sentimen Tweet Covid-19 Varian Omicron pada Platform Media Sosial Twitter menggunakan Metode LSTM berbasis Multi Fungsi Aktivasi dan GLOVE,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 6, no. 10, pp. 4653–4661, 2022, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/11648

T. Bastian Sianturi, I. Cholissodin, and N. Yudistira, “Penerapan Algoritma Long Short-Term Memory (LSTM) berbasis Multi Fungsi Aktivasi Terbobot dalam Prediksi Harga Ethereum,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 3, pp. 1101–1107, 2023, [Online]. Available: http://j-ptiik.ub.ac.id

DOI: https://doi.org/10.30591/jpit.v10i1.8128