With the increasing popularity of video-based learning, students are using videos more frequently as a primary source of information. However, searching for specific information within a video can be time-consuming and frustrating, as traditional video search methods are often not very efficient. Also, traditional content search methods often rely on manual tagging or transcription, which can be prone to errors. Thus, there is a need for a more efficient and user-friendly method of searching inside videos using natural language, which can provide accurate and relevant results to the users. To this, we propose a web application that takes an video file (as a link or as video upload) and applies Machine Learning Algorithms to get the timestamps of required searched Query. We have divided the whole architecture into 2 parts i.e. I-Seekr and A-Seekr. I-Seekr uses image frames to search inside videos and can be used only when the video is dynamic i.e. things are moving in the video. A-Seekr uses audio chunks to search and can be used when the video is static i.e. things are not moving much in the video. For e.g. if a teacher is teaching something , then we will use A-Seekr but to search for an accident, we opt for I-Seekr.
Use Case of
a) I-Seekr : Police authorities -> Scanning for crimes in CCTV,
Sports -> Refs to check for possible fouls
b) A-Seekr : Students -> Searching for specific topics in a long videos,
Podcasts -> Search for a topic in the podcast video
The challenges we face can be divided into three categories:-
Although Openai whisper provided the accurate results, but it took much time to process the audio clip. For this, we employed picovoice Api that has downsampled the time by upto 5 times compared to whisper model.
There were a lot of errors while meeting the requirements for ML models while deploying. To simplify the deployment purpose, we used Microsoft Azure based cloud architecture for MongoDB. During deployment, stack overflow helped a lot.
Discussion