Perbandingan Metode Random Forest dan Naïve Bayes dalam Email Spam Filtering


Maria Anita(1*), Bambang Susanto(2), Lenox Larwuy(3)

(1) , Indonesia
(2) Universitas Kristen Satya Wacana Salatiga,  
(3) Universitas Kristen Satya Wacana Salatiga,  
(*) Corresponding Author

Abstract


Email is an important tool not only for communicating and transferring files but also it can be used for advertising media over the Internet. Since the increase in email user numbers, many users send viruses, fraud, and even pornography contained emails. Those kinds of emails were called spam, where unexpected emails sent in bulk. Many email users are annoyed by the amount of time spent deleting individual spam messages. This study provides a comparison between the Random Forest and Naïve Bayes classification methods for email spam predicting. It aims for searching the most accurate method. The data used in this study is an email dataset totaling 2607 data with two variables, namely the body variable (which shows the contents of the email) and the label variable (which shows labeling) where 1 indicates spam and 0 indicates not spam. From the test result using the confusion matrix, it is known that the random forest method has the highest accuracy value, namely 98%, and Naïve Bayes 73%.


Keywords


Classification, Email, Naïve Bayes, Random Forest, Spam

References


D. K. Renuka and Dr. T. Hamsapriya, “Email classification for Spam Detection using Word Stemming,” Int J Comput Appl, vol. 1, no. 5, pp. 58–60, 2010, doi: 10.5120/125-241.

A. A. Akinyelu and A. O. Adewumi, “Classification of phishing email using random forest machine learning technique,” J Appl Math, vol. 2014, no. May, 2014, doi: 10.1155/2014/425731.

R. Y. Hayuningtyas, “Aplikasi Filtering of Spam Email Menggunakan Naïve Bayes,” IJCIT (Indonesian Journal on Computer and Information Technology), vol. 2, no. 1, pp. 53–60, 2017.

S. Defiyanti and D. L. Crispina Pardede, “Perbandingan kinerja algoritma id3 dan c4.5 dalam klasifikasi spam-mail,” ReCALL, 2008.

I. Nurandini and A. F. Huda, “Klastering Dokumen dengan Menambahkan Metadata Menggunakan Algoritma COATES,” Kubik: Jurnal Publikasi Ilmiah Matematika, vol. 2, no. 2, pp. 39–44, Nov. 2017, doi: 10.15575/kubik.v2i2.1859.

D. Faisal, Reza M; Nugrahadi, Belajar Data Science, no. February. Banjarbaru, Kalimantan Selatan, Indonesia: Scripta Cendekia, 2019.

H. D. Anggana, “Penerapan Model Klasifikasi Regresi Logistik, Support Vector Machine , Classification and Regression Tree Terhadap Data Kejadian Difteri Di Provinsi Jawa Barat,” Euclid, vol. 5, no. 2, p. 20, 2018, doi: 10.33603/e.v5i2.1121.

G. Louppe, “Understanding Random Forests: From Theory to Practice,” no. July, 2014.

D. F. Durrah, R. Cahyandari, and A. S. Awalluddin, “Model Regresi Data Panel Terbaik untuk Faktor Penentu Laba Neto Perusahaan Asuransi Umum Syariah di Indonesia,” Kubik: Jurnal Publikasi Ilmiah Matematika, vol. 5, no. 1, pp. 28–34, Oct. 2020, doi: 10.15575/kubik.v5i1.8488.

A. Y. Samudra, “Pendekatan Random Forest untuk Model Peramalan Harga Tembakau Rajangan Di Kabupaten Temanggung,” vol. 8, no. 5, p. 55, 2019.

Syarli and A. A. Muin, “Metode Naive Bayes Untuk Prediksi Kelulusan,” Jurnal Ilmiah Ilmu Komputer, vol. 2, no. 1, pp. 22–26, 2016.




DOI: https://doi.org/10.15575/kubik.v7i2.18933

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 maria anita

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Journal KUBIK: Jurnal Publikasi Ilmiah Matematika has indexed by:

SINTA DOAJ Dimensions Google Scholar Garuda Moraref DOI Crossref

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.