"Textual Tune"

This project aims to facilitate efficient information retrieval from PDFs containing a mix of English, Telugu, and Urdu text, catering to diverse linguistic and font requirements.

Built at Reckon 5.0

Created on 31st January 2024

•

"Textual Tune"

This project aims to facilitate efficient information retrieval from PDFs containing a mix of English, Telugu, and Urdu text, catering to diverse linguistic and font requirements.

The problem "Textual Tune" solves

1)Addresses the needs of users across different domains and professions.It's versatility makes it applicable in research, legal, educational, and business contexts, where multilingual documents are common.

2)Effortless Information Retrieval: Users can effortlessly search and retrieve information from PDF documents containing a mix of English, Telugu, and Urdu text. It supports Unicode-encoded text and employs specialized parsing for Telugu and Urdu fonts, ensuring accurate and comprehensive results.

3)Facilitates users dealing with multilingual documents.: Recognizes and processes English, Telugu (using Shreelipi font), and Urdu (using Noori Nastaliq font), catering to a wide range of linguistic needs.

4)Expands the scope of searchable content by including text embedded within images.Utilizes OCR technology to extract and process text from images, enabling users to search through both conventional and image-based PDFs.

5)the Multilingual PDF App simplifies the process of searching, retrieving, and working with diverse content within PDFs, contributing to increased productivity and efficiency across various industries and tasks.

6)Voice assisstant facility helps users in various tasks by taking voice inputs and providing efficient solution by text and voice both.

Challenges we ran into

1)One significant hurdle was accurately extracting text from images embedded in PDFs, especially when dealing with various fonts, sizes, and orientations.
2)Testing the search algorithm with diverse datasets containing English, Telugu, and Urdu content to identify and rectify any language-specific issues.
3)Implementing language-specific indexing strategies to optimize search performance.
4)Detecting languages of different words because of different usage of fonts.

Tracks Applied (1)

Software

The Multilingual PDF Search Engine for Telugu and Urdu is positioned within the software track as a comprehensive langua...Read More

Technologies used

Flutter

OCR

Dart Programming Language

Google ML Kit

pdf parsing libraries

image processing libraries

user interface components

Discussion

Builders also viewed

See more projects on Devfolio