Let’s understand why are people going through such long formats of texts in the first place - deep research over a topic requires iterations and skimming through academic materials of scholars - for a better view of the topic in question. But often this is a tedious task in the following ways -
Information Overload: Reviewing excessive amounts of text can lead to mental exhaustion and frustration - SummarEase gets past this by providing crisp video summaries eliminating the tedious text absorption process.
Time-consuming review process: Hours of focused learning with limited time and reduced attention span of the user can be overwhelming - SummarEase can absorb the highlights within a matter of minutes - saving time for important tasks.
We selected this problem statement because we were intrigued by text-video conversion - our curiosity led us through the project! However, there's a reason why there aren't many existing solutions around the problem statement - it's tough to implement. We faced numerous challenges while building SummarEase, but here are a few that tested our mettle the most:
Sync and concatenation of final output: Each split sentence has a frame associated with it. The first challenge we faced was to sync these video clips in the correct contextual order according to the text-summary. To achieve video sync, we ensured that video clips were created in order and named after the index of their respective split sentences in the list. The final step is to concatenate these indexed videos together in order.
Complexities in text-chunking and tokenization: The length of PDF documents may vary from small PDFs with 3000 tokens to those with 90,000 tokens or more. Deciding the chunk size for these variable-sized inputs was a challenge. Eventually, we narrowed the chunk size to 15000 tokens for large documents and kept the standard length of tokens for smaller documents, thereby enhancing the accuracy of summarization for all types of documents.
Reducing computational time for output.mp4 file: All three steps took distinct time for computation, hence increasing the wait time for the user. To overcome this, we implemented our solution using asynchronous programming and parallelized multiple processes simultaneously.
Tracks Applied (3)
Vonage (Part of Ericsson)
Neurelo at Hack This Fall
Neurelo at Hack This Fall
Discussion