DocMan,,,

The Document Search Engine SuperHero

Built at Lean In Hacks 4.0

Created on 12th February 2023

•

DocMan,,,

The Document Search Engine SuperHero

The problem DocMan,,, solves

This tool can solve problems quickly and easily by finding relevant information within a large collection of documents. This can include searching for specific keywords or phrases. Traditional methods for searching through a large collection of documents, such as manually reading through each document or using a file system's search function, can be time-consuming and inefficient.

Challenges we ran into

The challenges that we face in set-up the local environment for the project, we were not able to run apache/tika in our local system, so to solve that we follow the docker container approach, instead of running the apache/tika and database locally we running them on docker container.

The second challenge was to connect apache tika server with our node API as it accepts put request and headers were essential for parsing the correct information.

the third was very important one. The pdf contains images inside it and we have to extract data inside images. SO to parse the pdf extract the images and then pass it to apache tika server.

The fourth was to identify a flexible database to store the document data and also provide search option. SO after a long research we came to know about elasticsearch.

Then we have to implement the ui phase along with the specific info to be displayed. We implemented array logic and used some methods in order to solve this problem. Overall the challenges phase was awesome.

Tracks Applied (2)

Social Cause

It is directly helping lot of people by giving this idea to the government agencies.

Open Innovation

As it is newly build product and not bound to a particular problem statement.

Technologies used

Node.js

Embedded Javascript (EJS)

Docker

Microsoft Azure

Tesseract OCR

Kibana

Elastic Search

Apache Tika

pdf-poppler

pdf-img-convert

Discussion

Builders also viewed

See more projects on Devfolio