As more and more people are being aware of the dark web, illegal activities are on a steep rise due to the anonymity it provides. Bad actors are using the dark web for multiple things like trafficking, hitmen for hire, child porno, leaking sensitive information, etc.
Efficiently and effectively monitoring of the dark web by regulatory authorities to keep a check on the illegal activities and make it a safer place for the public is the need of the hour, But manually doing it not only time taking but also very ineffective in most of the case taking the number of new websites that come online every day.
The project is an automated bot which once stated will scrape through the dark web links recursively (a crawler) and will also check for specific keywords appearing on the websites that might seem illegal. It will do the following
1.)scrape through the websites for more websites and important subdirectories
2.)It will go through the whole website ( main index and all the subdirectories )for hit words
3.) If any website seems to be used for illegal purposes, it will set it in the alerts
4.)The whole process is automatic and all the user needs to do is to start for the first time, after that it will stop only being aborted or once it has scanned very website it could find.
5.) The entire process is well logged and the user can stop the process at any instance without being worried about losing the scan data(even all the domains through which it crawls)
Crawling through the dark web links was a real challenge as the links are ever evolving and a constant search needs to be done to keep a track of the data.
Making the data scalable and easy to access was the second challenge we faced while doing this project
Technologies used
Discussion