HIDE

HIDE

Data Anonymization for Law Enforcement

The problem HIDE solves

Problem: Balancing data collection and utilization for effective law enforcement with public privacy concerns requires innovative solutions.

While data privacy is important in police departments, analysis of the data using various third-party tools is also needed. For this anonymizing the identity details without comprising data quality is the mandate for this exercise.

Expected Solution -
In any dataset of crimes (FIR/Chargesheet/statements etc.), all the personal identifiers like names, places, addresses, times, and places should be identified in the dataset from various documents and formats and replaced with random placeholders in such a manner that, analysis of that data should give valid actionable results. - The same exercise should also contain a tool to anonymize any information that is fed by a web application. Key in both the exercises for this would be the logic and reliability of data should remain valid. - While OCR tools need not be developed, identifying the personal information out of the data set or file format has to be completed. - The option to pick and choose which parts of private data to be anonymized should be given clearly in a drop-down format at the interface of this tool.

Challenges we ran into

In Recognizers Accuracy: Improving accuracy in recognizers is a key challenge. Default models include entities like Aadhar card, PAN number, and driving license, but they often lack accuracy. To address this, we need to analyze context and enhance accuracy, especially for these specific entities.

Side facial recognition: Initially, we relied on OpenCV's Haar Cascade model, which worked well for recognizing front-facing faces but struggled with side faces. To overcome this limitation, we transitioned to YOLO V8, a more robust solution for side facial recognition.

Frame drops during video face recognition : To tackle the issue of frame drops during video facial anonymization, we implemented YOLO V8. This helped us maintain performance and ensure smooth processing while anonymizing faces in videos.

Discussion