Perbandingan Metode Random Forest dan Naïve Bayes dalam Email Spam Filtering
DOI:
https://doi.org/10.15575/kubik.v7i2.18933Keywords:
Classification, Email, Naïve Bayes, Random Forest, SpamAbstract
Email is an important tool not only for communicating and transferring files but also it can be used for advertising media over the Internet. Since the increase in email user numbers, many users send viruses, fraud, and even pornography contained emails. Those kinds of emails were called spam, where unexpected emails sent in bulk. Many email users are annoyed by the amount of time spent deleting individual spam messages. This study provides a comparison between the Random Forest and Naïve Bayes classification methods for email spam predicting. It aims for searching the most accurate method. The data used in this study is an email dataset totaling 2607 data with two variables, namely the body variable (which shows the contents of the email) and the label variable (which shows labeling) where 1 indicates spam and 0 indicates not spam. From the test result using the confusion matrix, it is known that the random forest method has the highest accuracy value, namely 98%, and Naïve Bayes 73%.
References
D. K. Renuka and Dr. T. Hamsapriya, “Email classification for Spam Detection using Word Stemming,†Int J Comput Appl, vol. 1, no. 5, pp. 58–60, 2010, doi: 10.5120/125-241.
A. A. Akinyelu and A. O. Adewumi, “Classification of phishing email using random forest machine learning technique,†J Appl Math, vol. 2014, no. May, 2014, doi: 10.1155/2014/425731.
R. Y. Hayuningtyas, “Aplikasi Filtering of Spam Email Menggunakan Naïve Bayes,†IJCIT (Indonesian Journal on Computer and Information Technology), vol. 2, no. 1, pp. 53–60, 2017.
S. Defiyanti and D. L. Crispina Pardede, “Perbandingan kinerja algoritma id3 dan c4.5 dalam klasifikasi spam-mail,†ReCALL, 2008.
I. Nurandini and A. F. Huda, “Klastering Dokumen dengan Menambahkan Metadata Menggunakan Algoritma COATES,†Kubik: Jurnal Publikasi Ilmiah Matematika, vol. 2, no. 2, pp. 39–44, Nov. 2017, doi: 10.15575/kubik.v2i2.1859.
D. Faisal, Reza M; Nugrahadi, Belajar Data Science, no. February. Banjarbaru, Kalimantan Selatan, Indonesia: Scripta Cendekia, 2019.
H. D. Anggana, “Penerapan Model Klasifikasi Regresi Logistik, Support Vector Machine , Classification and Regression Tree Terhadap Data Kejadian Difteri Di Provinsi Jawa Barat,†Euclid, vol. 5, no. 2, p. 20, 2018, doi: 10.33603/e.v5i2.1121.
G. Louppe, “Understanding Random Forests: From Theory to Practice,†no. July, 2014.
D. F. Durrah, R. Cahyandari, and A. S. Awalluddin, “Model Regresi Data Panel Terbaik untuk Faktor Penentu Laba Neto Perusahaan Asuransi Umum Syariah di Indonesia,†Kubik: Jurnal Publikasi Ilmiah Matematika, vol. 5, no. 1, pp. 28–34, Oct. 2020, doi: 10.15575/kubik.v5i1.8488.
A. Y. Samudra, “Pendekatan Random Forest untuk Model Peramalan Harga Tembakau Rajangan Di Kabupaten Temanggung,†vol. 8, no. 5, p. 55, 2019.
Syarli and A. A. Muin, “Metode Naive Bayes Untuk Prediksi Kelulusan,†Jurnal Ilmiah Ilmu Komputer, vol. 2, no. 1, pp. 22–26, 2016.
Downloads
Published
Issue
Section
License
Authors who publish in KUBIK: Jurnal Publikasi Ilmiah Matematika agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Â