Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara

Yohanes Setiawan

Abstract


Detecting breast cancer in early stage is not straightforward. This happens because biopsy test requires time to determine whether the type is benign or malignant. Data mining algorithm has been widely used to automate diagnosis of a disease. One of popular algorithms is nearest neighbor based because of its simplicity and low computation. However, too many features can cause low accuracy in nearest neighbor based models. In this research, nearest neighbor based with feature selection is developed to detect breast cancer.  Conventional k-Nearest Neighbor (KNN) and Multi Local Means k-Harmonic Nearest Neighbor have been chosen as nearest neighbor based models to experiment. The feature selection method used in this study is filter based, namely Correlation based, Information Gain, and ReliefF. The experimental result shows that the highest recall metric of MLM-KHNN and Information Gain is 94% with 5 features. In brief, MLM-KHNN algorithm with Information Gain can increase the recall of the prediction of breast cancer compared with the conventional K-NN algorithm and have been deployed into website using Streamlit such that the model can be used to detect breast cancer from chosen Wisconsin dataset features.

Keywords


Kanker Payudara, Data Mining, Nearest Neighbor, Seleksi Fitur

Full Text:

References


M. Ravly Andryan et al., “Komparasi Kinerja Algoritma Xgboost Dan Algoritma Support Vector Machine (Svm) Untuk Diagnosa Penyakit Kanker Payudara,” Jurnal Informatika dan Komputer), vol. 6, no. 1, pp. 1–5, 2022.

A. S. Elkorany, M. Marey, K. M. Almustafa, and Z. F. Elsharkawy, “Breast Cancer Diagnosis Using Support Vector Machines Optimized by Whale Optimization and Dragonfly Algorithms,” IEEE Access, vol. 10, pp. 69688–69699, 2022, doi: 10.1109/ACCESS.2022.3186021.

M. Monirujjaman Khan et al., “Machine Learning Based Comparative Analysis for Breast Cancer Prediction,” J Healthc Eng, vol. 2022, 2022, doi: 10.1155/2022/4365855.

A. W. Satria Bahari Johan, S. W. Putri, G. Hajar, and A. Y. Wicaksono, “Modified KNN-LVQ for Stairs Down Detection Based on Digital Image,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 12, no. 3, p. 141, Nov. 2021, doi: 10.24843/lkjiti.2021.v12.i03.p02.

J. Gou et al., “A representation coefficient-based k-nearest centroid neighbor classifier,” Expert Syst Appl, vol. 194, May 2022, doi: 10.1016/j.eswa.2022.116529.

C. Paramita, E. Hari Rachmawanto, C. Atika Sari, and D. R. Ignatius Moses Setiadi, “Klasifikasi Jeruk Nipis Terhadap Tingkat Kematangan Buah Berdasarkan Fitur Warna Menggunakan K-Nearest Neighbor,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 4, no. 1, pp. 1–6, Jan. 2019, doi: 10.30591/jpit.v4i1.1267.

D. Apriliani, A. Susanto, M. Fikri Hidayattullah, and G. Wiro Sasmito, “Sentimen Analisis Pandangan Masyarakat Terhadap Vaksinasi Covid 19 Menggunakan K-Nearest Neighbors,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 8, no. 1, 2023.

A. A. A’Ziziyyah, I. I. Nugroho, R. Sabillillah, B. A. S. Aji, and K. Amiroh, “Perbandingan Sistem Deteksi Banjir Menggunakan Algoritma Naive Bayes Dan K-NN Berbasis IOT,” IJCIT (Indonesian Journal on Computer and Information Technology), vol. 7, no. 1, 2022.

T. A. Assegie, “An optimized K-Nearest neighbor based breast cancer detection,” Journal of Robotics and Control (JRC), vol. 2, no. 3, pp. 115–118, May 2021, doi: 10.18196/jrc.2363.

M. J. Vikri et al., “Penerapan Fungsi Exponential Pada Pembobotan Fungsi Jarak Euclidean Algoritma K-Nearest Neighbor,” Generation Journal, vol. 6, no. 2

Z. Pan, Y. Wang, and W. Ku, “A new k-harmonic nearest neighbor classifier based on the multi-local means,” Expert Syst Appl, vol. 67, pp. 115–125, Jan. 2017, doi: 10.1016/j.eswa.2016.09.031.

Z. Pan, Y. Pan, Y. Wang, and W. Wang, “A new globally adaptive k-nearest neighbor classifier based on local mean optimization,” Soft comput, vol. 25, no. 3, pp. 2417–2431, Feb. 2021, doi: 10.1007/s00500-020-05311-x.

T. Widiharih and M. A. Mukid, “Credit Scoring Menggunakan Metode Local Means Based K Harmonic Nearest Neighbor (MLMKHNN),” MEDIA STATISTIKA, vol. 11, no. 2, pp. 107–117, Dec. 2018, doi: 10.14710/medstat.11.2.107-117.

A. Assegaf, M. A. Mukid, and A. Hoyyi, “Analisis Kesehatan Bank Menggunakan Local Mean K-Nearest Neighbor dan Multi Local Means K-Harmonic Nearest Neighbor,” vol. 8, no. 3, pp. 343–355, 2019, [Online]. Available: http://ejournal3.undip.ac.id/index.php/gaussian

M. Siahaan, “Data Mining Strategi Pembangunan Infrastruktur Menggunakan Algoritma K-Means,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 11, no. 3, pp. 316–324, Dec. 2022, doi: 10.32736/sisfokom.v11i3.1453.

W. T. Wu et al., “Data mining in clinical big data: the frequently used databases, steps, and methodological models,” Military Medical Research, vol. 8, no. 1. BioMed Central Ltd, Dec. 01, 2021. doi: 10.1186/s40779-021-00338-z.

H. Thakkar, V. Shah, H. Yagnik, and M. Shah, “Comparative anatomization of data mining and fuzzy logic techniques used in diabetes prognosis,” Clinical eHealth, vol. 4, pp. 12–23, 2021, doi: 10.1016/j.ceh.2020.11.001.

U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 4. King Saud bin Abdulaziz University, pp. 1060–1073, Apr. 01, 2022. doi: 10.1016/j.jksuci.2019.06.012.

H. Huang, R. Jia, X. Shi, J. Liang, and J. Dang, “Feature selection and hyper parameters optimization for short-term wind power forecast,” Applied Intelligence, vol. 51, no. 10, pp. 6752–6770, Oct. 2021, doi: 10.1007/s10489-021-02191-y.

A. Thakkar and R. Lohiya, “Attack classification using feature selection techniques: a comparative study,” J Ambient Intell Humaniz Comput, vol. 12, no. 1, pp. 1249–1266, Jan. 2021, doi: 10.1007/s12652-020-02167-9.

C. Eiras-Franco, B. Guijarro-Berdiñas, A. Alonso-Betanzos, and A. Bahamonde, “Scalable feature selection using ReliefF aided by locality-sensitive hashing,” International Journal of Intelligent Systems, vol. 36, no. 11, pp. 6161–6179, Nov. 2021, doi: 10.1002/int.22546.

M. A. Naji, S. El Filali, K. Aarika, E. H. Benlahmar, R. A. Abdelouhahid, and O. Debauche, “Machine Learning Algorithms for Breast Cancer Prediction and Diagnosis,” in Procedia Computer Science, 2021, vol. 191, pp. 487–492. doi: 10.1016/j.procs.2021.07.062.




DOI: https://doi.org/10.30591/jpit.v8i2.4994

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

JPIT INDEXED BY

  
  

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.