CyberGaurd
Your Security is our Priority
The problem CyberGaurd solves
CyberGuard is a proactive cyber threat monitoring system that addresses the growing concern of sensitive user data being leaked or misused on the dark web. In today’s digital world, data breaches and leaks are rampant, exposing personal information such as email addresses, credit card numbers, and phone numbers. CyberGuard aims to solve this problem by continuously scanning dark web sources and marketplaces for traces of this compromised data.
The core problem CyberGuard addresses is the lack of early detection and awareness when individuals' or organizations’ sensitive data appears on illicit platforms. Most victims are unaware of breaches until damage is already done — such as unauthorized transactions, identity theft, or phishing attacks. CyberGuard provides a safeguard by allowing users to input specific parameters (like email, credit card number, or phone number), and then scans .onion sites using Tor-based scraping techniques to check if any of this information has been exposed.
By automating the discovery and analysis of leaked data, CyberGuard not only detects threats early but also empowers users and organizations to take immediate corrective action. This project is especially valuable for cybersecurity teams, journalists, and privacy-conscious users who want to stay informed about their digital footprint. In essence, CyberGuard acts as a digital watchdog, helping reduce the risk of exploitation by shining a light into the hidden corners of the internet where data leaks often go unnoticed.
Challenges we ran into
While developing the scraper tool for CyberGuard, our initial plan was to build a system that could crawl and extract data from dark web .onion sites using the Tor network. However, after working on it for several hours, we discovered a major hurdle — scraping the deep and dark web reliably is far more complex than anticipated. Many .onion sites are highly volatile, often go offline, and are protected by anti-bot mechanisms or CAPTCHA challenges, making consistent scraping nearly impossible with limited resources.
Realizing this limitation, we decided to pivot our approach. Instead of targeting .onion sites directly, we focused on publicly available paste sites like Pastebin, which frequently host leaked data dumps containing sensitive information such as emails, phone numbers, and credit card details.
We implemented keyword-based search queries such as "Indian leak dataset", "credit card dump", and similar terms to identify relevant data. The scraper then extracts this information and stores it in a MongoDB database. To keep the data up to date, we set up the scraper to run at regular intervals and fetch fresh dumps.
This pivot not only helped us overcome the technical roadblocks but also allowed us to build a more stable and scalable solution for monitoring leaked data.
Tracks Applied (1)