Efficient Ensemble-based Phishing Website Classification Models using Feature Importance Attribute Selection and Hyper parameter Tuning Approaches

https://doi.org/10.48185/jitc.v4i2.891

Authors

  • Jimoh R. G
  • Oyelakin A. M Department of Computer Science, Crescent University, Abeokuta, Nigeria
  • Abikoye O. C.
  • Akanbi M. B.
  • Gbolagade M. D
  • Akanni A. O.
  • Jibrin M. A.
  • Ogundele T. S.

Keywords:

Phishing , Cyber Security, Classification Models, Hyper parameter Tuning

Abstract

The internet is now a common place for different business, scientific and educational activities. However, there are bad elements in the internet space that keep using different attack techniques to perpetrate evils. Among these categories are people who use phishing techniques to launch attacks in the enterprise networks and internet space. The use of machine learning (ML) approaches for phishing attacks classification is an active research area in the field of cyber security. This is because phishing attack detection is a good example of intrusion identification tasks. These machine learning techniques can be categorized as single and ensemble learners. Ensemble learners have been identified to be more promising than the single classifiers. However, some of the ways to achieve an improved ML-based detection models are through feature selection/dimensionality reduction as well as hyper parameter tuning.  This study focuses on the classification of phishing websites using ensemble learning algorithms. Random Forest (RF) and Extra Trees ensembles were used for the phishing classification. The models built from the algorithms are optimized by applying a feature importance attribute selection and hyper parameter tuning approaches. The RF-based phishing classification model achieved 99.3% accuracy, 0.996 recall, 0.983 f1-score, 0.996 precision and 1.000 as AUC score. Similarly, Extra Trees-based model attained 99.1% accuracy, 0.990 as recall, F1-score was 0.981, precision of 0.990 while AUC score is 1.000. Thus, the RF-based phishing classification model slightly achieved better classification results when compared with the Extra Trees own. The study concluded that attribute selection and hyper parameter tuning approaches employed are very promising.

Downloads

Download data is not yet available.

References

Adewale, O. S., & Olugbara, O. O. (2017). A Comparative Study of Machine Learning Algorithms for Email Spam Filtering, Expert Systems with Applications, 74, 219-236.

Aljammal, A. H., Taamneh , S. ., Qawasmeh, A. ., & Bani Salameh, H. (2023). Machine Learning Based Phishing Attacks Detection Using Multiple Datasets. International Journal of Interactive Mobile Technologies (iJIM), 17(05), pp. 71–83. https://doi.org/10.3991/ijim.v17i05.37575

APWG (2022). Phishing Activity Trends Report, 4th Quarter 2022, Unifying the Global Response To Cybercrime, Activity October - December 2022, https://docs.apwg.org/reports/apwg_trends_report_q4_2022.pdf

Biswas, A., Dasgupta, A., & Nag, P. K. (2018). Feature Engineering and Selection for Spam URL Classification, International Journal of Computer Applications, 179(30), 25-28.

Breiman L. (2001). Random Forests, Machine Learning, 45(1), 5-32, (2001). Available at: https://doi.org/10.1023/A:1010933404324

Hossain Sohrab, Sarma Dhiman & Chakma R. (2020). Machine Learning-Based Phishing Attack Detection, International Journal of Advanced Computer Science and Applications (IJACSA), (11)9, 2020DOI:10.14569/ijacsa.2020.0110945Corpus ID: 222469828

Jimoh R. G., Oyelakin A. M. Olatinwo , I. S., Obiwusi Y. K., Muhammad-Thani S., Ogundele T. S., Giwa-Raheem A. & Ayepeku O. F. (2022). Experimental Evaluation of Ensemble Learning-Based Models for Twitter Spam Classification, 2022 5th Information Technology for Education and Development (ITED) conference, held at Nile University Abuja, Nigeria

Li, X., & Li, X. (2019). Web page classification using machine learning: A comprehensive survey. ACM Computing Surveys, 52(6), 1-34.

Mohammad,Rami and McCluskey,Lee. (2015). Phishing Websites. UCI Machine Learning Repository. https://doi.org/10.24432/C51W2X

Martin Jessica (2022). How phishing can ruin the good name of an online brand, published by reputation, retrieved from https://blog.reputationx.com/guest/whats-phishing on 1st July, 2023

Mohammad, Rami M., Thabtah, Fadi & McCluskey, Lee. (2014). Intelligent Rule based Phishing Websites Classification. IET Information Security, 8 (3), 153-160. 2014, 1751-8709, available at https://archive.ics.uci.edu/ml/machine-learning-databases/00327/

Mohanty Sanjukta & Acharya Arup Abhinna (2023). MFBFST: Building a stable ensemble learning model using multivariate filter-based feature selection technique for detection of suspicious URL, Procedia Computer Science, Volume 218, 2023, Pages 1668-1681

Orji, I. J., & Emekwuru, O. E. (2019). Comparative Analysis of Machine Learning Algorithms for Phishing Website Detection. International Journal of Computer Science and Information Technology Research, 7(2), 98-106.

Oyelakin A. M., Olatinwo I. S., Rilwan D. M., Azeez R. D. & Obiwusi Y. K (2021a). Investigation into the Performances of Supervised Learning Algorithms in different Phishing Datasets, Pakistan Journal of Engineering Technology and Science (PJETS), 9(2), 24-32

Oyelakin A. M., Alimi M. O., Mustapha I.O. & Ajiboye I. K. (2021b). Analysis of Single and Ensemble Machine Learning Classifiers for Phishing Attacks Detection. International Journal of Software Engineering and Computer Systems, 7(2), 44–49, Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, https://doi.org/10.15282/ijsecs.7.2.2021.5.0088

Oyelakin A. M., Alimi O. M., & Abdulrauf T. (2020). Performance Analysis of Selected Machine Learning Algorithms for the Classification of Phishing URLs, Journal of Computer Science and Control Systems, 13(2), 16–19 , available at https://electroinf.uoradea.ro/images/articles/CERCETARE/Reviste/JCSCS/JCSC_V13_N2_oct2020/JCSCS VOL 13 NO 2 OCTOBER 2020 Oyelakin_Performance.pdf

Oyelakin A. M. (2014). Spear Phishing Email Attack on Nigerian Bank Account Holders: Online Awareness to the Rescue, in the proceedings of ISTEAM Conference 2014, Afe Babalola University, Ado Ekiti, Nigeria, 185-188

Patil Dharmaraj R. & Patil Jayantrao (2018). Malicious URLs Detection Using Decision Tree Classifiers and Majority Voting Technique, Cybernetics and Information Technologies 18(1):11-29, DOI: , 10.2478/cait-2018-0002

Pierre Geurts, Damien Ernst & Louis Wehenkel (2006). Extremely randomized trees, Machine Learning, 63: 3–42, DOI:10.1007/s10994-006-6226-1https://link.springer.com/content/pdf/10.1007/s10994-006-6226-1.pdf

Yang Li and Shami Abdallah (2022).On Hyperparameter Optimization of Machine Learning

Algorithms: Theory and Practice, a preprint retrieved from arXiv:2007.15745v3 [cs.LG] 5 Oct 2022

Published

2023-12-30

How to Cite

Jimoh, . R. G., OYELAKIN, A. M., O. C. , A. ., M. B., A. ., M. D, G. ., A. O. , A. ., M. A. , J. ., & T. S. , O. . (2023). Efficient Ensemble-based Phishing Website Classification Models using Feature Importance Attribute Selection and Hyper parameter Tuning Approaches. Journal of Information Technology and Computing, 4(2), 1–10. https://doi.org/10.48185/jitc.v4i2.891

Issue

Section

Articles