AI Snipping Tool

A Chrome extension that takes custom screenshots, extracts text, and allows users to ask questions based on the extracted content.

Built at Social Summer Of Code Season 3

Created on 1st August 2024

•

AI Snipping Tool

A Chrome extension that takes custom screenshots, extracts text, and allows users to ask questions based on the extracted content.

The problem AI Snipping Tool solves

Typing out text from videos, images, or thumbnails on websites is often a tedious and error-prone task. This issue becomes particularly evident on platforms like YouTube, where valuable information is frequently presented in video content or thumbnail images. Users face challenges when they need to manually extract this text, especially if it includes links, technical terms, code examples, or mathematical equations.

Scenarios where this problem arises:

Manually Typing Displayed Links: Typing out links shown in videos or thumbnails is inefficient and prone to errors since they cannot be copied directly.
Copying References in Presentations: Extracting references or citations displayed in the footer of presentations is difficult and time-consuming.
Copying Code to Editors: Transcribing code snippets from videos or images into a text editor is not feasible directly and is error-prone.
Extracting Text from Images: Capturing text from documents shared as images on social media or other platforms is challenging.
Feeding Text to LLMs: Users may need to extract text to input into language models for summarization or further processing.
Addressing these challenges would significantly improve efficiency and accuracy in text extraction from multimedia sources.

Challenges I ran into

I was running into some issues when integrating the Gemini API using the official quickstart guide as it requires using import maps which are only supported through inline JS, and chrome doesn't allow extensions to run inline scripts. I managed to find a workaround by making a direct fetch request instead, based on details in the cURL method.
The original code structure of the project, especially of the injection code, was quite confusing and not well documented, as in it contained a lot of unused functions and redundant code. I did a significant amount of refactoring and documenting as I made my pull requests.
We had to migrate to ChromeStorage API from localStorage due to security concerns.

Technologies used

HTML

CSS

JavaScript

Tesseract OCR

Google Gemini

ChromeStorage API

Discussion

Builders also viewed

See more projects on Devfolio