Created on 22nd July 2024
•
Phishing is a type of cyber attack where malicious actors attempt to trick individuals into providing sensitive information, such as usernames, passwords, or financial details, often by masquerading as a trustworthy entity in electronic communications.
By using both datasets and URL features, your model aims to identify and flag potential phishing attempts. This involves analyzing various attributes of URLs and other data to distinguish between legitimate and malicious sites or messages. The goal is to enhance security by preventing users from falling victim to phishing schemes, thereby protecting their personal and financial information.
Collect dataset containing phishing and legitimate websites from the open source platforms.
Write a code to extract the required features from the URL database.
Address Bar based Features
Domain based Features
HTML & JavaScript based Feature
Using web scraping and automation to extract URLs directly from browser using selenium and beautifulsoup.
Analyze and preprocess the dataset by using EDA techniques.
Divide the dataset into training and testing sets.
Run selected machine learning and deep neural network algorithms like SVM, Random Forest, XGBoost,
Auto encoder on the dataset.
Write a code for displaying the evaluation result considering accuracy metrics with respect to each feature.
This project can be used in real world by creating website.
I had a problem with how to tell how much percentage is phshing or not and how to join the two models to give final judgment to tell whether a URL is phshing or not.For this I have made a django backend and connected them while giving the features weights.
FEATURE EXTRACTION
Address Bar based Features considered are:
• Domain of URL • Redirection ‘//’ in URL
• IP Address in URL • ‘http/https’ in Domain name
• ‘@’ Symbol in URL • Using URL Shortening Service
• Length of URL • Prefix or Suffix "-" in Domain
• Depth of URL
Domain based Features considered are:
• DNS Record • Age of Domain
• Website Traffica • End Period of Domain
HTML and JavaScript based Features considered are:
• Iframe Redirection • Disabling Right Click
• Status Bar Customization • Website Forwarding
All together 17 features are extracted from the dataset.
I have made a data analysis for how each feature has importance and how much each weight will be based on this importance.
Using the views in my django project i have used some data structures and logic to overcome some of the problem,
Tracks Applied (2)
Polygon
ETHIndia