Perbandingan Metode Random Forest dan Naïve Bayes dalam Email Spam Filtering

Authors

  • Maria Anita Universitas Kristen Satya Wacana Salatiga
  • Bambang Susanto Universitas Kristen Satya Wacana Salatiga
  • Lenox Larwuy Universitas Kristen Satya Wacana Salatiga

DOI:

https://doi.org/10.15575/kubik.v7i2.18933

Keywords:

Classification, Email, Naïve Bayes, Random Forest, Spam

Abstract

Email is an important tool not only for communicating and transferring files but also it can be used for advertising media over the Internet. Since the increase in email user numbers, many users send viruses, fraud, and even pornography contained emails. Those kinds of emails were called spam, where unexpected emails sent in bulk. Many email users are annoyed by the amount of time spent deleting individual spam messages. This study provides a comparison between the Random Forest and Naïve Bayes classification methods for email spam predicting. It aims for searching the most accurate method. The data used in this study is an email dataset totaling 2607 data with two variables, namely the body variable (which shows the contents of the email) and the label variable (which shows labeling) where 1 indicates spam and 0 indicates not spam. From the test result using the confusion matrix, it is known that the random forest method has the highest accuracy value, namely 98%, and Naïve Bayes 73%.

References

D. K. Renuka and Dr. T. Hamsapriya, “Email classification for Spam Detection using Word Stemming,†Int J Comput Appl, vol. 1, no. 5, pp. 58–60, 2010, doi: 10.5120/125-241.

A. A. Akinyelu and A. O. Adewumi, “Classification of phishing email using random forest machine learning technique,†J Appl Math, vol. 2014, no. May, 2014, doi: 10.1155/2014/425731.

R. Y. Hayuningtyas, “Aplikasi Filtering of Spam Email Menggunakan Naïve Bayes,†IJCIT (Indonesian Journal on Computer and Information Technology), vol. 2, no. 1, pp. 53–60, 2017.

S. Defiyanti and D. L. Crispina Pardede, “Perbandingan kinerja algoritma id3 dan c4.5 dalam klasifikasi spam-mail,†ReCALL, 2008.

I. Nurandini and A. F. Huda, “Klastering Dokumen dengan Menambahkan Metadata Menggunakan Algoritma COATES,†Kubik: Jurnal Publikasi Ilmiah Matematika, vol. 2, no. 2, pp. 39–44, Nov. 2017, doi: 10.15575/kubik.v2i2.1859.

D. Faisal, Reza M; Nugrahadi, Belajar Data Science, no. February. Banjarbaru, Kalimantan Selatan, Indonesia: Scripta Cendekia, 2019.

H. D. Anggana, “Penerapan Model Klasifikasi Regresi Logistik, Support Vector Machine , Classification and Regression Tree Terhadap Data Kejadian Difteri Di Provinsi Jawa Barat,†Euclid, vol. 5, no. 2, p. 20, 2018, doi: 10.33603/e.v5i2.1121.

G. Louppe, “Understanding Random Forests: From Theory to Practice,†no. July, 2014.

D. F. Durrah, R. Cahyandari, and A. S. Awalluddin, “Model Regresi Data Panel Terbaik untuk Faktor Penentu Laba Neto Perusahaan Asuransi Umum Syariah di Indonesia,†Kubik: Jurnal Publikasi Ilmiah Matematika, vol. 5, no. 1, pp. 28–34, Oct. 2020, doi: 10.15575/kubik.v5i1.8488.

A. Y. Samudra, “Pendekatan Random Forest untuk Model Peramalan Harga Tembakau Rajangan Di Kabupaten Temanggung,†vol. 8, no. 5, p. 55, 2019.

Syarli and A. A. Muin, “Metode Naive Bayes Untuk Prediksi Kelulusan,†Jurnal Ilmiah Ilmu Komputer, vol. 2, no. 1, pp. 22–26, 2016.

Published

2023-06-05