Automated File Organization:
Users upload folders containing various documents to be organized.
The system automates the organization process to save user time and effort.
OCR (Optical Character Recognition):
OCR is used to extract text from images and scanned documents.
Zonal OCR is performed to accurately extract text from specific areas of documents.
Handling Different File Formats:
Different file formats like PDF, DWG, PNG, JPG, and JPEG are supported.
The system identifies the file type to determine the appropriate processing method.
File Size Analysis:
PDF file sizes are analyzed to determine if they are A4 or larger (e.g., A1).
Different OCR methods are used based on the file size and type to optimize text extraction.
Categorization Based on Keywords:
Keywords extracted from the documents are used for categorization.
Documents are sorted into folders based on predefined conditions and keyword matching.
Output Folder Creation:
A new organized folder is created with sorted files based on the categorization process.
Users can download the organized folder containing their sorted documents.
Manual Organization:
Users have the option for manual organization, allowing them to specify sorting criteria.
User Interface:
The website provides a user-friendly interface for uploading folders, viewing results, and downloading organized files.
Extraction of relevant text while perform optical character recognition
Technologies used
Discussion