On Identifying Phishing Emails and Websites using Content Analysis, URL Analysis, and Sender Behavior

Authors

  • Azah Anir Norman Faculty of Computer Science and Information Technology, Universiti Malaya

Keywords:

Phishing; Email; NLP; Detection System.

Abstract

With phishing attacks becoming increasingly sophisticated, traditional detection methods are struggling to keep pace, leading to significant cybersecurity vulnerabilities. This paper addresses the urgent need for more adaptable and accurate phishing detection mechanisms. It highlights the limitations of current systems, particularly their inability to effectively parse and understand the nuanced language of phishing emails. By contextualizing the problem within the broader landscape of escalating cyber threats, the paper sets the stage for the introduction of an innovative solution combining Natural Language Processing (NLP) and advanced machine learning techniques.

Downloads

Download data is not yet available.

References

Tang, L., & Mahmoud, Q. H. (2021). A survey of machine learning-based solutions for phishing website detection. Machine Learning and Knowledge Extraction, 3(3), 672–694.

Kulikova, T., Dedenok, R., Svistunova, O., Kovtun, A., Shimko, I. (2023, February 16). Kaspersky’s 2022 spam and phishing report. Securelist.com.

Zscaler ThreatLabz State of Phishing Report | Zscaler. (2022). Info.zscaler.com. https://info.zscaler.com/resources-industry-reports- threatlabz-phishing-report

Phishing Threats Report. Cloudflare. (2023).

Chowdhary, KR1442, and K. R. Chowdhary. "Natural language processing." Fundamentals of artificial intelligence (2020): 603-649.

Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.

Jawahar, G., Sagot, B., & Seddah, D. (2019, July). What does BERT learn about the structure of language?. In ACL 2019-57th Annual Meeting of the Association for Computational Linguistics.

Rogers, A., Kovaleva, O., & Rumshisky, A. (2021). A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, 842-866.

McNeal, A. (2023, April 4). History of phishing: Origins and evolution. Graphus. https://www.graphus.ai/blog/history-of-phishing/

Rekouche, K. (2011). Early Phishing. Arvix. https://arxiv.org/ftp/arxiv/papers/1106/1106.4692.p df

Newsroom.cisco.com. https://newsroom.cisco.com/c/r/newsroom/en/us/a/ y2023/m02/security-history-the-evolution-of- phishing.html

Sharifi, M., & Siadati, S. H. (2008). A phishing sites blacklist generator. 2008 IEEE/ACS International Conference on Computer Systems and Applications. https://doi.org/10.1109/aiccsa.2008.4493625

Peng, T., Harris, I., & Sawa, Y. (2018). Detecting phishing attacks using natural language processing and machine learning. 2018 IEEE 12th International Conference on Semantic Computing (ICSC). https://doi.org/10.1109/icsc.2018.00056

Xiao, X., Xiao, W., Zhang, D., Zhang, B., Hu, G., Li, Q., & Xia, S. (2021). Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets. Computers & Security, 108, 102372.

Basit, A., Zafar, M., Liu, X., Javed, A. R., Jalil, Z., & Kifayat, K. (2020). A comprehensive survey of AI- enabled phishing attacks detection techniques. Telecommunication Systems, 76(1), 139–154. https://doi.org/10.1007/s11235-020-00733-2

Bikov, T. D., Iliev, T. B., Mihaylov, Gr. Y., & Stoyanov, I. S. (2019). Phishing in depth – modern methods of detection and risk mitigation. 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). https://doi.org/10.23919/mipro.2019.8757074

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems (NeurIPS) (pp. 6000-6010).

Abideen, Z. U., Sun, X., Sun, C., & Khalil, H. S. U. R. (2024). Improved Deep Learning-based Approach for Spatial-Temporal Trajectory Planning via Predictive Modeling of Future Location. KSII Transactions on Internet and Information Systems (TIIS), 18(7), 1726-1748.

Kenton, J. D. M. W. C., & Toutanova, L. K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT (Vol. 1, p. 2).

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language Understanding by Generative Pretraining. OpenAI Blog. https://openai.com/blog/language-unsupervised/

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021, June 3). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv.org. https://arxiv.org/abs/2010.11929

Parisotto, E., & Salakhutdinov, R. (2017). Neural map: Structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360.

Yang, Q., Cranshaw, J., Amershi, S., Iqbal, S. T., & Teevan, J. (2019, May). Sketching nlp: A case study of exploring the right things to design with language intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-12).

Nayak, P. (2019). Understanding searches better than ever before. The Keyword, 295.

Zhao, H. (2023, September 7). Deep learning in security: Text-based phishing email detection with Bert Model.Splunk.

Published

2024-03-14

How to Cite

Norman, A. A. . (2024). On Identifying Phishing Emails and Websites using Content Analysis, URL Analysis, and Sender Behavior. Journal of Information Systems Research and Practice, 2(1), 94–105. Retrieved from https://vmis.um.edu.my/index.php/JISRP/article/view/52200