Rancangan Sistem Deteksi Berita Hoaks dengan IndoBERT Berbasis Dataset Scraping Seimbang (2020–2025)
PDF

Keywords

IndoBERT
hoaks
klasifikasi teks
scraping
evaluasi model

How to Cite

Rancangan Sistem Deteksi Berita Hoaks dengan IndoBERT Berbasis Dataset Scraping Seimbang (2020–2025). (2025). Prosiding SEMNAS INOTEK (Seminar Nasional Inovasi Teknologi), 9(1), 001-007. https://doi.org/10.29407/k4y5pw15

Abstract

Penyebaran berita hoaks terus meningkat seiring berkembangnya teknologi informasi. Penelitian ini merancang sistem klasifikasi berita hoaks Bahasa Indonesia menggunakan model IndoBERT. Dataset disusun melalui web scraping dari TurnBackHoax.id (berita hoaks), serta CNN Indonesia, Detik.com, dan Kompas.com (berita non-hoaks), mencakup berbagai kategori berita dari tahun 2020 hingga 2025 dengan total 25.296 data. Seluruh data hoaks digunakan, sedangkan data non-hoaks disesuaikan agar seimbang. Model IndoBERT di-fine-tune dengan freeze layer 1–8 dan pelatihan selama lima epoch. Evaluasi menggunakan Confusion Matrix, Classification Report, ROC AUC, dan Precision-Recall Curve. Hasil menunjukkan bahwa model mampu mengklasifikasikan berita hoaks dan non-hoaks secara akurat. Penelitian ini memberikan kontribusi melalui pemanfaatan IndoBERT pada data terkini yang seimbang, serta penggunaan metode evaluasi yang komprehensif.

PDF

References

[1] W. A. Social and Hootsuite, “Digital 2023: Indonesia,” 2023.

[2] MAFINDO, “Laporan Hoaks Semester 1 Tahun 2023,” 2023.

[3] N. Agustina, A. Adrian, and M. Hermawati, “Implementasi Algoritma Naïve Bayes Classifier untuk Mendeteksi Berita Palsu pada Sosial Media,” Faktor Exacta, vol. 14, no. 4, p. 206, Jan. 2022, doi: 10.30998/faktorexacta.v14i4.11259.

[4] N. G. Ramadhan, F. D. Adhinata, A. J. T. Segara, and D. P. Rakhmadani, “Deteksi Berita Palsu Menggunakan Metode Random Forest dan Logistic Regression,” JURIKOM (Jurnal Riset Komputer), vol. 9, no. 2, p. 251, Apr. 2022, doi: 10.30865/jurikom.v9i2.3979.

[5] D. F. N. Anisa, I. Mukhlash, and M. Iqbal, “Deteksi Berita Online Hoax Covid-19 Di Indonesia Menggunakan Metode Hybrid Long Short Term Memory dan Support Vector Machine,” Jurnal Sains dan Seni ITS, vol. 11, no. 3, Mar. 2023, doi: 10.12962/j23373520.v11i3.83227.

[6] A. Hanifa, S. A. Fauzan, M. Hikal, and M. B. Ashfiya, “Perbandingan Metode LSTM dan GRU (RNN) untuk Klasifikasi Berita Palsu Berbahasa Indonesia,” Dinamika Rekayasa, vol. 17, no. 1, p. 33, Jan. 2021, doi: 10.20884/1.dr.2021.17.1.436.

[7] A. Agarwal, M. Mittal, A. Pathak, and L. M. Goyal, “Fake News Detection Using a Blend of Neural Networks: An Application of Deep Learning,” SN Comput Sci, vol. 1, no. 3, May 2020, doi: 10.1007/s42979-020-00165-4.

[8] arun kumar yadav et al., “Fake News Detection using Hybrid Deep Learning Method,” May 05, 2022. doi: 10.36227/techrxiv.19689844.v1.

[9] A. Aggarwal, A. Chauhan, D. Kumar, M. Mittal, and S. Verma, “Classification of Fake News by Fine-tuning Deep Bidirectional Transformers based Language Model,” EAI Endorsed Transactions on Scalable Information Systems, vol. 7, no. 27, pp. 1–12, 2020, doi: 10.4108/eai.13-7-2018.163973.

[10] S. M. Sr and S. Ahmad, “BERT based Blended approach for Fake News Detection,” Journal of Big Data and Artificial Intelligence, vol. 2, no. 1, Jan. 2024, doi: 10.54116/jbdai.v2i1.27.

[11] R. K. Kaliyar, A. Goswami, and P. Narang, “FakeBERT: Fake news detection in social media with a BERT-based deep learning approach,” Multimed Tools Appl, vol. 80, no. 8, pp. 11765–11788, Mar. 2021, doi: 10.1007/s11042-020-10183-2.

[12] P. Dhiman, A. Kaur, D. Gupta, S. Juneja, A. Nauman, and G. Muhammad, “GBERT: A hybrid deep learning model based on GPT-BERT for fake news detection,” Heliyon, vol. 10, no. 16, Aug. 2024, doi: 10.1016/j.heliyon.2024.e35865.

[13] C. Jocelynne, I. L. Wijayakusuma, and L. P. I. Harini, “Detection of Political Hoax News Using Fine-Tuning IndoBERT,” Journal of Applied Informatics and Computing, vol. 9, no. 2, pp. 354–360, Mar. 2025, doi: 10.30871/jaic.v9i2.8989.

[14] G. I. W. Koto, F. I. Rahmaningtyas, R. Mahendra, and A. Purwarianti, “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” in Proceedings of the 12th Language Resources and Evaluation Conference (LREC), 2020. [Online]. Available: https://aclanthology.org/2020.lrec-1.420/

[15] Y. Shen, Q. Liu, N. Guo, J. Yuan, and Y. Yang, “Fake News Detection on Social Networks: A Survey,” Nov. 01, 2023, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/app132111877.

[16] D. Herlina, Y. P. Putri, and I. N. M. Astawa, “Analisis Sentimen Berita Berbahasa Indonesia Menggunakan Metode SVM dan TF-IDF,” Jurnal RESTI, vol. 6, no. 2, 2022, [Online]. Available: https://ejournal.undip.ac.id/index.php/resti/article/view/39550

[17] D. M. Powers, “Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation,” Journal of Machine Learning Technologies, 2011, [Online]. Available: https://www.researchgate.net/publication/220722885

[18] B. Sahiner, W. Chen, A. Pezeshk, and N. Petrick, “Comparison of two classifiers when the data sets are imbalanced: the power of the area under the precision-recall curve as the figure of merit versus the area under the ROC curve,” in Proc. SPIE 10136, Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment, 2017. doi: 10.1117/12.2254742.

[19] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.

[20] A. Beger, “Precision-Recall Curves,” Social Science Research Network, 2016, doi: 10.2139/SSRN.2765419.

[21] H. Dafaalla et al., “Deep Learning Model for Selecting Suitable Requirements Elicitation Techniques,” Applied Sciences, vol. 12, no. 18, 2022, doi: 10.3390/app12189060.

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright (c) 2025 Mochamad Abdul Azis, Aditya Arya Respati, Erna Daniati