IMPLEMENTATION OF HYPERPARAMETER OPTIMISATION AND OVER-SAMPLING IN DETECTING CYBERBULLYING USING MACHINE LEARNING APPROACH

Wan Noor Hamiza  Wan Ali; Masnizah  Mohd; Fariza  Fauzi; Kiyoaki  Shirai; Muhammad Junaidi  Mahamad Noor

doi:10.22452/mjcs.sp2021no2.6

FULL TEXT

Published: Dec 31, 2021

DOI: https://doi.org/10.22452/mjcs.sp2021no2.6

Keywords:

Classification Cyberbullying Feature Extraction Hyperparameter Optimisation Machine Learning SMOTE TF-IDF Word Embedding

Wan Noor Hamiza Wan Ali

Center for Cyber Security, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor Malaysia

Masnizah Mohd

Center for Cyber Security, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor Malaysia

Fariza Fauzi

Center for Cyber Security, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor Malaysia

Kiyoaki Shirai

School of Advanced Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa Japan

Muhammad Junaidi Mahamad Noor

INVOKE Solutions Sdn Bhd, Sungai Besi, Kuala Lumpur, Malaysia

Abstract

Online social networks have become a necessity to everyone around the world. Particularly, online social networks have enabled us to connect to one another regardless of time, for as long as we have social media and social networking as platforms for broadcasting information and communicating, respectively. However, this evolution has resulted in people possibly committing various cybercrimes, such as cyberbullying. To address this issue, machine learning can be utilised to counter cyberbullying in online social networks. Thus, this study proposed a framework with a set of features consisting of word and character term frequency–inverse document frequency and word embedding by using Word2vec and six types of list terms: profane words, proper nouns, negation words, ‘allness’ term, diminisher words and intensifier words. These features were divided into four groups before being fed into the linear support vector classifier to train our model using ASKfm as data set in hyperparameter tuning and over-sampling environment. Results indicated that the proposed framework provided significant outcomes, in which the highest percentage of area under curve is 99.24% and F-measure is 97.38% as performed by our trained model.

Downloads

Download data is not yet available.

How to Cite

Wan Ali, W. N. H. ., Mohd, M. ., Fauzi, F. ., Shirai, K. ., & Mahamad Noor, M. J. . (2021). IMPLEMENTATION OF HYPERPARAMETER OPTIMISATION AND OVER-SAMPLING IN DETECTING CYBERBULLYING USING MACHINE LEARNING APPROACH. Malaysian Journal of Computer Science, 78–100. https://doi.org/10.22452/mjcs.sp2021no2.6

Issue

2021: Special Issue 2/2021: "Information Retrieval and Knowledge Management"

Section

Articles

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Most read articles by the same author(s)