OCR enables the digitalization of a wide range of paper-based documents in a variety of languages and formats. Because a large amount of data is still available in paper form, the quality of the document uploaded is critical in text extraction. Because everyone seems to be time conscious these days, even ocr operations may be time consuming. When a document must be scanned, the quality of the scan makes a significant difference in the results. This is where ocrzilla comes into being. It analyses the uploaded document and assigns it to one of four categories: good, bad, very good, or very bad.
Because the dataset used to train the model was too small, the model was prone to overfitting. We ran into major problems with machine learning because we couldn't improve the model's accuracy and it was taking too long. Building the website's backend was also difficult for us.
Technologies used
Discussion