Komparasi Algoritma Naive Bayes dan Random Forest untuk Identifikasi Kata Berpotensi Spam
DOI:
https://doi.org/10.29407/5v86qx08Keywords:
NLP, Machine Learning, Klasifikasi Teks, Email spam, Naive BayesAbstract
Penelitian ini mengeksplorasi penerapan machine learning untuk mendeteksi kata berpotensi spam pada Ling-Spam Dataset, yang menghadapi tantangan distribusi kelas tidak seimbang (imbalanced) dalam penyaringan pesan elektronik. Metode Naive Bayes dan Random Forest digunakan untuk mengatasi masalah inefisiensi filter berbasis aturan statis yang sering gagal mengenali pola promosi dinamis. Naive Bayes bekerja sebagai baseline probabilistik, sementara Random Forest menerapkan mekanisme ensemble untuk meningkatkan stabilitas prediksi pada fitur leksikal yang kompleks. Dataset terdiri dari 2.893 email yang diproses melalui ekstraksi fitur TF-IDF, dengan skema pembagian data terstratifikasi (80% pelatihan, 20% pengujian) guna menjaga proporsi kelas. Hasil eksperimen menunjukkan Naive Bayes mencatatkan akurasi global tertinggi sebesar 99,47%, sedangkan Random Forest menghasilkan akurasi 98,42% dengan keunggulan nilai presisi sempurna mencapai 100%. Temuan ini membuktikan bahwa pendekatan ensemble lebih efektif dalam meminimalisir kesalahan positif palsu (false positive), memberikan solusi filtrasi yang objektif dan andal untuk meningkatkan keamanan email tanpa memblokir pesan yang sah
Downloads
References
[1] M. Basil Musyaffa Amin et al., “Deteksi Spam Berbahasa Indonesia Berbasis Teks Menggunakan Model Bert,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 6, pp. 1291–1302, Dec. 2024, doi: 10.25126/JTIIK.2024118121.
[2] M. Spam et al., “Memahami Spam Terhadap Digitalisasi Masyarakat Desa Perkebunan Sei Balai Kecamatan Sei Balai Kabupaten Batubara,” J. Pengabdi. Masy., vol. 2, no. 2, pp. 151–159, Dec. 2023, doi: 10.70340/JAPAMAS.V2I2.90.
[3] M. Sari and G. Mahalisa, “NAIVE BAYES CLASSIFIER UNTUK DETEKSI EMAIL SPAM - Google Scholar,” J. Inform., vol. 15, no. 4, pp. 675–680, 2024, [Online]. Available: https://scholar.google.com/scholar?hl=id&as_sdt=0%2C5&q=NAIVE+BAYES+CLASSIFIER+UNTUK+DETEKSI+EMAIL+SPAM&btnG=
[4] M. Rustam, A. Brotokuncoro, and R. Roestam, “Deteksi Email Spam dengan Continuous Bag-Of-Words dan Random Forest,” Ranah Res. J. Multidiscip. Res. Dev., vol. 6, no. 4, pp. 758–765, Jun. 2024, doi: 10.38035/RRJ.V6I4.873.
[5] S. A. Aklani, H. Haeruddin, and N. Putri, “IMPLEMENTASI MAIL GATEWAY SECURITY DALAM MENINGKATKAN KEAMANAN EMAIL,” J. Inf. Syst. Manag., vol. 5, no. 2, pp. 150–155, Jan. 2024, doi: 10.24076/JOISM.2024V5I2.1378.
[6] M. A. Alfayyed, D. Sandra, and Jasmir, “Perbandingan Algoritma Naive Bayes Dan K-Nearest Neighbor Pada Klasifikasi Email,” J. Inform. Dan Rekayasa Komputer(JAKAKOM), vol. 5, no. 2, pp. 1696–1705, Sep. 2025, doi: 10.33998/JAKAKOM.2025.5.2.2378.
[7] H. Mukhtar, J. Al Amien, and M. A. Rucyat, “Filtering Spam Email menggunakan Algoritma Naïve Bayes,” J. CoSciTech (Computer Sci. Inf. Technol., vol. 3, no. 1, pp. 9–19, 2022, doi: 10.37859/coscitech.v3i1.3652.
[8] A. Nur et al., “Implementasi Algoritma Regresi Logistik untuk Binary Classification dalam Spam SMS dan WhatsApp,” Pros. SEMNAS INOTEK (Seminar Nas. Inov. Teknol., vol. 7, no. 1, pp. 80–93, Jul. 2023, doi: 10.29407/INOTEK.V7I1.3413.
[9] G. Putra, “KLASIFIKASI EMAIL SPAM MENGGUNAKAN ALGORITMA ARTIFICIAL NEURAL NETWORK DAN SUPPORT VECTOR MACHINE,” J. Komput. dan Inform., vol. 20, no. 1, pp. 09–15, Apr. 2025, Accessed: Nov. 05, 2025. [Online]. Available: https://journal.untar.ac.id/index.php/JKI/article/view/34646
[10] E. S. Ainun, U. Inayah, and M. Ilmih, “Klasifikasi Email Spam Dan Ham Menggunakan Algoritma Support Vector Machine, Naive Bayes Dan Logistic Regression,” Sci. J. Comput. Sci. Informatics, vol. 2, no. 2, pp. 77–84, Jul. 2025, doi: 10.34304/SCIENTIFIC.V2I2.399.
[11] A. Purnama and D. Hamidin, “Metode Algoritma Logistic Regression dalam Klasifikasi Email Spam,” J. Software, Hardw. Inf. Technol., vol. 5, no. 1, pp. 39–47, Jan. 2025, doi: 10.24252/SHIFT.V5I1.159.
[12] N. M. Damayanti, I. D. Ariningtyas, M. Izuddin, A. Icham, and A. P. Sari, “ANALISIS SENTIMEN PUBLIK PADA TAGAR #BTSCOMEBACK DI PLATFORM X MENGGUNAKAN INDOBERTWEET,” J. Inform. dan Tek. Elektro Terap., vol. 13, no. 3, pp. 2830–7062, Jul. 2025, doi: 10.23960/JITET.V13I3.7176.
[13] S. H. Rukiman and A. Rahmatulloh, “Evaluasi Efektivitas Penggunaan FastText Embedding dan LSTM Networks dalam Deteksi Phishing Email,” Fakt. Exacta, vol. 18, no. 2, pp. 194–200, Oct. 2025, doi: 10.30998/FAKTOREXACTA.V18I2.26769.
[14] A. H. Nurridha, W. Hidayat, and A. Erfina, “Sentiment Analysis of the Issue of Eliminating the Independent Curriculum using the Naïve Bayes Classifier Algorithm,” Sist. J. Sist. Inf., vol. 14, no. 2, pp. 713–725, Mar. 2025, doi: 10.32520/STMSI.V14I2.5039.
[15] M. P. Syah, A. P. Wardani, M. Idhom, and Trimono, “Perbandingan Representasi Teks Tf-Idf Dan Bert Terhadap Akurasi Cosine Similarity Dalam Penilaian Otomatis Jawaban Berbasis Teks,” Data Sci. Indones., vol. 5, no. 1, pp. 47–59, Jul. 2025, doi: 10.47709/DSI.V5I1.6021.
[16] F. Agil Firmansyah, U. Enri, I. Maulana, J. H. Ronggowaluyo, T. Timur, and J. Barat, “PENERAPAN ALGORITMA NAIVE BAYES DENGAN CHI-SQUARE UNTUK KLASIFIKASI SPAM EMAIL BERBASIS KATA DAN FREKUENSI,” J. Inform. dan Tek. Elektro Terap., vol. 13, no. 1, pp. 2830–7062, Jan. 2025, doi: 10.23960/JITET.V13I1.5506.
[17] A. K. Kencana, F. D. Ananda, A. D. Hartanto, and H. Hartatik, “Implementasi Metode Random Forest Klasifikasi untuk Phishing Link Detection,” Intechno J. Inf. Technol. J., vol. 4, no. 2, pp. 55–59, Dec. 2022, doi: 10.24076/INTECHNOJOURNAL.2022V4I2.1562.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Muhammad Fajrin Aswad Ad-Duali, Dwika Putra Adinata

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- The author grants the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-ShareAlike 4.0 International License





