Analisis Efektivitas Fine-Tuning dan Prompt Engineering Berbasis Llama 3.1 pada Deteksi Depresi di Media Sosial

Muhammad Ikhsan Asagaf; Junta Zeniarja

doi:10.30591/jpit.v11i2.10170

Analisis Efektivitas Fine-Tuning dan Prompt Engineering Berbasis Llama 3.1 pada Deteksi Depresi di Media Sosial

Muhammad Ikhsan Asagaf, Junta Zeniarja

Abstract

Detecting depression has become an important concern in addressing mental health issues. According to WHO, more than 300 million people suffer from depression. Large Language Models offer great potential to address this issue, however the full fine-tuning process is often hampered by heavy computational requirements, and LLMs that are not specifically configured for a particular context can result in biased and inaccurate outcomes. This study aims to analyze the effectiveness of Prompt Engineering and Fine-Tuning using QLoRA in improving the accuracy of depression detection. Utilizing the Llama-3.1-8B-Instruct model on social media datasets, this research compares model performance in two scenarios consist of the application of direct prompting strategies on the base model and the application of QLoRA fine-tuning. Evaluation results demonstrate that the Chain-of-Thought strategy improved baseline accuracy from 81.4% to 84.4%, but still exhibited significant bias towards the 'Severe' class. In contrast, the QLoRA Fine-Tuning approach proved superior, achieving 92.4% accuracy with balanced F1-Scores across classes, effectively eliminating detection bias in the 'Minimum' class. These findings confirm that while prompting techniques can enhance baseline performance, QLoRA provides a more accurate, stable, and objective solution for depression detection tasks.

Keywords

Depression; Fine-Tuning; LLM; Prompt Engineering; QLoRA.

Full Text:

References

N. Aisyaroh, I. Hudaya, and R. Supradewi, “Trend Penelitian Kesehatan Mental Remaja di Indonesia dan Faktor yang Mempengaruhi: Literature Review,” Scientific Proceedings of Islamic and Complementary Medicine, vol. 1, no. 1, pp. 41–51, Aug. 2022, doi: 10.55116/spicm.v1i1.6.

D. Ridha Dwiki Putri, M. Reza Fahlevi, M. Sadikin, R. Utami, and Rizki Fajar Utomo, “Prediksi Tingkat Depresi Remaja Menggunakan Metode Naïve Bayes Classifier: Analisis Faktor Psikologis Dan Lingkungan,” KESATRIA: Jurnal Penerapan Sistem Informasi (Komputer & Manajemen), vol. 5, no. 4, pp. 2034–2043, Oct. 2024.

World Health Organization (WHO), “Depressive disorder (depression),” www.who.int. Accessed: Dec. 12, 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/depression#

S. N. Salsabila and T. Ardi Ardani, “Analisis Dampak dan Konsekuensi Penggunaan Media Sosial Terhadap Tingkat Bunuh Diri pada Usia Dewasa Awal: A Systematic Literature Review,” Jurnal Ilmu Psikologi dan Kesehatan (SIKONTAN), vol. 3, no. 2, pp. 45–52, Oct. 2024, doi: 10.47353/sikontan.v3i2.1921.

Peldi, Syahruddin, and Asmurti, “Penggunaan Media Sosial Sebagai Representase Gaya Hidup Mahasiswa,” Jurnal Ilmiah Ilmu Sosial dan Pendidikan, vol. 2, no. 2, pp. 78–83, May 2024.

L. Ilias, S. Mouzakitis, and D. Askounis, “Calibration of Transformer-Based Models for Identifying Stress and Depression in Social Media,” IEEE Trans. Comput. Soc. Syst., vol. 11, no. 2, pp. 1979–1990, Apr. 2024, doi: 10.1109/TCSS.2023.3283009.

R. Salas-Zárate, G. Alor-Hernández, M. D. P. Salas-Zárate, M. A. Paredes-Valverde, M. Bustos-López, and J. L. Sánchez-Cervantes, “Detecting Depression Signs on Social Media: A Systematic Literature Review,” Healthcare (Switzerland), vol. 10, no. 2, Feb. 2022, doi: 10.3390/healthcare10020291.

Y. Tolla and Kusrini, “Deteksi Stres dan Depresi Unggahan Media Sosial dengan Machine Learning,” Jurnal Fasilkom, vol. 15, no. 1, pp. 84–92, Apr. 2025, doi: 10.37859/jf.v15i1.9067.

K. Setyo Nugroho, I. Akbar, A. Nizar Suksmawati, and Istiadi, “Deteksi Depresi dan Kecemasan Pengguna Twitter menggunakan Bidirectional Lstm,” in The 4th Conference on Innovation and Application of Science and Technology (CIASTECH 2021), 2021, pp. 287–296. doi: 10.31328/ciastech.v0i0.3321.

A. A. Pangestu and P. Akhmad Rezki, “Perbandingan Performa Arsitektur Machine Learning untuk Deteksi Dini Depresi Berbasis Natural Language Processing dalam Bahasa Indonesia,” Journal of Informatics, Information System, and Artificial Intelligence), vol. 3, no. 2, pp. 93–104, 2025, doi: 10.24815/j-sign.v3i2.49873.

H. R. Lawrence, R. A. Schneider, S. B. Rubin, M. J. Matarić, D. J. McDuff, and M. Jones Bell, “The Opportunities and Risks of Large Language Models in Mental Health.,” JMIR Ment. Health, vol. 11, p. e59479, Jul. 2024, doi: 10.2196/59479.

M. A. K. Raiaan et al., “A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges,” IEEE Access, vol. 12, pp. 26839–26874, 2024, doi: 10.1109/ACCESS.2024.3365742.

T. Kallstenius, A. J. Capusan, G. Andersson, and A. Williamson, “Comparing traditional natural language processing and large language models for mental health status classification: a multi-model evaluation,” Sci. Rep., vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-08031-0.

S. S. Alahmari, L. O. Hall, P. R. Mouton, and D. B. Goldgof, “Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA,” IEEE Access, vol. 12, pp. 153221–153231, 2024, doi: 10.1109/ACCESS.2024.3470850.

A. Mahendra and Styawati, “Implementasi Lowk-Rank Adaptation of Large Languange Model (LORA) Untuk Effisiensi Large Language Model,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 9, no. 4, pp. 1881–1890, Nov. 2024, doi: 10.29100/jipi.v9i4.5519.

G. Il Kim, S. Hwang, and B. Jang, “Efficient Compressing and Tuning Methods for Large Language Models: A Systematic Literature Review,” ACM Comput. Surv., vol. 57, no. 10, pp. 1–39, Oct. 2025, doi: 10.1145/3728636.

A. Shen et al., “Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation,” Trans. Assoc. Comput. Linguist., vol. 13, pp. 861–877, Jul. 2025, doi: 10.1162/TACL.a.23.

Z. Tan, X. Xiong, and D. Xu, “Efficient Differentially Private Fine-Tuning with QLoRA and Prefix Tuning for Large Language Models,” Journal of Computer Science and Artificial Intelligence, vol. 2, no. 3, pp. 50–54, Mar. 2025, doi: 10.54097/we271q84.

G. Phillips-Wren and A. Håkansson, “Towards Using Prompt Engineering in Large Language Models to Assist Decision Making,” Procedia Comput. Sci., vol. 270, pp. 5225–5238, 2025, doi: 10.1016/j.procs.2025.09.650.

N. Esmi, A. Shahbahrami, Y. Nabati, B. Rezaei, G. Gaydadjiev, and P. de Jonge, “Stress detection through prompt engineering with a general-purpose LLM,” Acta Psychol. (Amst)., vol. 260, p. 105462, Oct. 2025, doi: 10.1016/j.actpsy.2025.105462.

Y. H. P. P. Priyadarshana, Z. Liang, and I. Piumarta, “HelaDepDet: A Novel Multi-class Classification Model for Detecting the Severity of Human Depression,” in Collaboration Technologies and Social Computing: 29th International Conference, CollabTech 2023, Osaka, Japan, August 29–September 1, 2023, Proceedings, Berlin, Heidelberg: Springer-Verlag, 2023, pp. 3–18. doi: 10.1007/978-3-031-42141-9_1.

M. Cavus and P. Biecek, “Investigating the impact of balancing, filtering, and complexity on predictive multiplicity: A data-centric perspective,” Information Fusion, vol. 123, p. 103243, Nov. 2025, doi: 10.1016/j.inffus.2025.103243.

A. M. Sharifnia, D. E. Kpormegbey, D. K. Thapa, and M. Cleary, “A Primer of Data Cleaning in Quantitative Research: Handling Missing Values and Outliers,” J. Adv. Nurs., Mar. 2025, doi: 10.1111/jan.16908.

S. Wu, X. Zhu, and H. Wang, “Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources,” Stat. Sin., 2023, doi: 10.5705/ss.202021.0257.

Meta, “Llama 3.1 Model Cards and Prompt Formats,” llama.com. Accessed: Dec. 12, 2025. [Online]. Available: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/

S. Sivarajkumar, M. Kelley, A. Samolyk-Mazzanti, S. Visweswaran, and Y. Wang, “An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study,” JMIR Med. Inform., vol. 12, p. e55318, Apr. 2024, doi: 10.2196/55318.

A. Kong et al., “Better Zero-Shot Reasoning with Role-Play Prompting,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 4099–4113. doi: 10.18653/v1/2024.naacl-long.228.

R. Vinay, G. Spitale, N. Biller-Andorno, and F. Germani, “Emotional prompting amplifies disinformation generation in AI large language models,” Front. Artif. Intell., vol. 8, May 2025, doi: 10.3389/frai.2025.1543603.

J. Li, G. Li, Y. Li, and Z. Jin, “Structured Chain-of-Thought Prompting for Code Generation,” ACM Transactions on Software Engineering and Methodology, vol. 34, no. 2, pp. 1–23, Feb. 2025, doi: 10.1145/3690635.

G. Hermawan and E. Rainarli, “Evaluasi Gemini Flash pada Ekstraksi Jadwal Skripsi Terstruktur dan Tidak Terstruktur,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 10, no. 4, pp. 1080–1091, Nov. 2025, doi: 10.30591/jpit.v10i4.9047.

DOI: https://doi.org/10.30591/jpit.v11i2.10170