Perbandingan Cosine Similarity dan Weighted Jaccard Similarity dalam Pengembangan Mesin Pencari Perpustakaan Digital

Jessicha Putrianingsih Pamput, Aindri Rizky Muthmainnah, Dewi Fatmarani Surianto, Nur Fadilah

Abstract


This study addressed the problem of low relevance in search results within the digital library system of the Department of Informatics and Computer Engineering (JTIK), Universitas Negeri Makassar. The purpose of this research was to improve the accuracy and relevance of search outcomes, enabling users, particularly students, to access academic materials and research references more efficiently. A search engine system was developed using a term-weighting method based on term frequency and document distribution. The system incorporated similarity measurement techniques to evaluate the degree of match between user queries and document content. An experimental approach was applied, which involved observation, data collection, text preprocessing, implementation of term weighting, and the comparison of cosine similarity and Weighted Jaccard similarity for ranking search results. The The evaluation was conducted using the Precision@K metric and a paired t-test to measure the significance of performance differences between methods. The test results showed that Weighted Jaccard obtained an average Precision@K value of 0.933, slightly higher than Cosine Similarity with an average of 0.9. However, Cosine Similarity produced a higher average similarity value. In addition, system testing was conducted in two stages, namely assessing user satisfaction with search results and assessing system performance. These findings confirmed that the combination of term-weighting and cosine similarity effectively enhanced the relevance and performance of digital library search systems.

Keywords


Comparison; Cosine Similarity; Digital Library; Search Engine; Weighted Jaccard Similarity.

Full Text:

References


Indonesia, “Undang-Undang Republik Indonesia Nomor 43 Tahun 2007 tentang Perpustakaan,” 43, 2007

R. Aditomo Mahardika Putra, D. Pratiwi, G. Pramita, and F. Dewantoro, “Implementasi Perpustakaan Digital Di SMK Negeri 1 Trimurjo, Kabupaten Lampung Tengah,” JEIT-CS, vol. 1, no. 3, pp. 180–186, 2023, doi: 10.33365/jeit-cs.v1i3.230.

A. Suhaimah, A. Triayudi, and E. T. E. Handayani, “Cyber Library: Pengembangan Perpustakaan Online Berbasis Web Menggunakan Metode Prototyping (Studi Kasus Universitas Nasional),” Jurnal JTIK, vol. 5, 2021.

A. P. Arum and Y. Marfianti, “Pengembangan Perpustakaan Digital untuk Mempermudah Akses Informasi,” SIJALU, vol. 2, no. 2, pp. 92–100, 2021, doi: 10.26623/jisl.

B. Pratala, “Peningkatan Layanan Perpustakaan IPDN Kampus Jakarta Melalui Sistem Perpustakaan Digital,” CENDEKIA : Jurnal Ilmu Pengetahuan, vol. 2, no. 1, 2022.

K. P. Sari, A. Masruri, and D. R. Rosalia, “Optimalisasi Temu Kembali Informasi Dengan Teknologi Kecerdasan Buatan di Perpustakaan,” JIPI (Jurnal Ilmu Perpustakaan dan Informasi), vol. 8, no. 2, p. 349, Nov. 2023, doi: 10.30829/jipi.v8i2.17775.

F. Cao, J. Zhang, X. Zha, K. Liu, and H. Yang, “A comparative analysis on digital libraries and academic search engines from the dual-route perspective,” The Electronic Library, vol. 39, pp. 354–372, 2021.

C. Intelligence and Neuroscience, “Retracted: Application of Digital Information Technology in Book Classification and Quick Search in University Libraries,” Comput Intell Neurosci, vol. 2023, no. 1, Jan. 2023, doi: 10.1155/2023/9892352.

T. K. Wulandari, E. D. Oktaviani, and A. Lestari, “Penerapan Metode Binary Search dan Hamming Distance pada E-library SMAN 2 Katingan Hilir,” KONSTELASI: Konvergensi Teknologi dan Sistem Informasi, vol. 2, no. 1, 2022.

S. A. Savittri, A. Amalia, and M. A. Budiman, “A relevant document search system model using word2vec approaches,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Jun. 2021. doi: 10.1088/1742-6596/1898/1/012008.

R. Ramadhan, “Pengelolaan Perpustakaan Digital di Badan Perpustakaan dan Kearsipan Daerah Provinsi Jawa Barat,” Jurnal Pustaka Budaya, vol. 10, no. 1, pp. 21–31, 2023, [Online]. Available: https://journal.unilak.ac.id/index.php/pb/

A. Aliwy, A. Abbas, and A. Alkhayyat, “NERWS: Towards Improving Information Retrieval of Digital Library Management System Using Named Entity Recognition and Word Sense,” Big Data and Cognitive Computing, vol. 5, no. 4, Dec. 2021, doi: 10.3390/bdcc5040059.

F. Galatolo, G. Martino, M. Cimino, and C. Tommasi, “Dense Information Retrieval on a Latin Digital Library via LaBSE and LatinBERT Embeddings,” INSTICC, Jul. 2023, pp. 518–523. doi: 10.5220/0012134700003541.

K. Ali, “Digital Information Literacy Skills among Library and Information Science Professionals in University Libraries of Sindh Pakistan,” JIMP, vol. 2, pp. 41–61, 2022.

D. Soyusiawaty, D. Hilmawan, and R. Wolley, “Hybrid Spelling Correction and Query Expansion for Relevance Document Searching,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 12, no. 8, p. 2021, 2021, [Online]. Available: www.ijacsa.thesai.org

B. Tang and B. Hu, “Design of Digital Library Data Search Engine Based on Cloud Computing in Big Data Era,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Oct. 2021. doi: 10.1088/1742-6596/2037/1/012137.

P. Meesad, “Thai Fake News Detection Based on Information Retrieval, Natural Language Processing and Machine Learning,” SN Comput Sci, vol. 2, no. 6, Nov. 2021, doi: 10.1007/s42979-021-00775-6.

A. Esteva et al., “COVID-19 Information Retrieval with Deep-Learning Based Semantic Search, Question Answering, and Abstractive Summarization,” NPJ Digit Med, vol. 4, no. 1, Dec. 2021, doi: 10.1038/s41746-021-00437-0.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008.

O. I. Gifari, Muh. Adha, F. Freddy, and F. F. S. Durrand, “Analisis Sentimen Review Film Menggunakan TF-IDF dan Support Vector Machine,” Journal of Information Technology, vol. 2, no. 1, pp. 36–40, Mar. 2022, doi: 10.46229/jifotech.v2i1.330.

L. Xiang, “Application of an Improved TF-IDF Method in Literary Text Classification,” Advances in Multimedia, vol. 2022, pp. 1–10, May 2022, doi: 10.1155/2022/9285324.

M. T. Mohammed and O. F. Rashid, “Document Retrieval Using Term Term Frequency Inverse Sentence Frequency Weighting Scheme,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 3, p. 1478, Sep. 2023, doi: 10.11591/ijeecs.v31.i3.pp1478-1485.

Nuzul Hikmah, Dyah Ariyanti, and Ferry Agus Pratama, “Implementasi Chatbot Sebagai Virtual Assistant di Universitas Panca Marga Probolinggo menggunakan Metode TF-IDF,” JTIM : Jurnal Teknologi Informasi dan Multimedia, vol. 4, no. 2, pp. 133–148, Aug. 2022, doi: 10.35746/jtim.v4i2.225.

X. Li and P. Li, “Rejection Sampling for Weighted Jaccard Similarity Revisited,” 2021. [Online]. Available: www.aaai.org

J. Zhu, B. G. Patra, H. Wu, and A. Yaseen, “a Novel NIH Research Grant Recommender Using BERT,” PLoS One, vol. 18, no. 1, p. e0278636, Jan. 2023, doi: 10.1371/journal.pone.0278636.

R. Wati, S. Ernawati, and H. Rachmi, “Pembobotan TF-IDF Menggunakan Naïve Bayes pada Sentimen Masyarakat Mengenai Isu Kenaikan BIPIH,” Jurnal Manajemen Informatika (JAMIKA), vol. 13, no. 1, pp. 84–93, Apr. 2023, doi: 10.34010/jamika.v13i1.9424.

A. Islam, E. Rahman, A. A. Chowdhury, and Md. A. N. Mojumder, “A Deep Learning Approach to Detect Plagiarism in Bengali Textual Content using Similarity Algorithms,” in 2023 IEEE International Conference on Contemporary Computing and Communications (InC4), IEEE, Apr. 2023, pp. 1–5. doi: 10.1109/InC457730.2023.10262998.

M. Alobed, A. M. M. Altrad, and Z. B. A. Bakar, “a Comparative Analysis of Euclidean, Jaccard and Cosine Similarity Measure and Arabic Wordnet for Automated Arabic Essay Scoring,” in 2021 Fifth International Conference on Information Retrieval and Knowledge Management (CAMP), IEEE, Jun. 2021, pp. 70–74. doi: 10.1109/CAMP51653.2021.9498119.

Z. Mundher, W. Khater, and L. Ganeem, “Adopting Text Similarity Methods and Cloud Computing to Build a College Chatbot Model,” JOURNAL OF EDUCATION AND SCIENCE, vol. 30, no. 1, pp. 117–125, Mar. 2021, doi: 10.33899/edusj.2020.127244.1079.

D. Soyusiawaty and D. H. R. Wolley, “Hybrid Spelling Correction and Query Expansion for Relevance Document Searching,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 8, 2021, doi: 10.14569/IJACSA.2021.0120838.




DOI: https://doi.org/10.30591/jpit.v10i4.8773

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

JPIT INDEXED BY

  
  

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.