DevFoolio

Uncover originality, empower authenticity

Built at hackCBS 7.0

Created on 10th November 2024

•

DevFoolio

Uncover originality, empower authenticity

The problem DevFoolio solves

The Problem it Solves

Problem

Hackathons are designed to foster innovation, but with the sheer volume of project submissions, ensuring each entry’s originality has become a significant challenge. Copying or subtly repurposing existing projects undermines the spirit of creativity and fair competition. Manually verifying projects for originality is not only time-consuming but also susceptible to human error, making it difficult to maintain a level playing field.

How This Project Helps

Our platform offers automated and reliable plagiarism detection specifically designed for Devfolio hackathons. By scanning and analyzing new project submissions and comparing them against a database of past projects, our tool efficiently identifies similarities and potential duplicates.

Key Benefits:

Ensures Integrity: Helps hackathon organizers verify originality, upholding the integrity of the event.
Supports Authentic Work: Allows participants to showcase their unique ideas with confidence, knowing their work will stand out.
Reduces Manual Effort: Eliminates the need for manual checks, streamlining the review process while minimizing human error.

With our system, hackathons can remain true to their mission of encouraging fresh, authentic ideas and fostering an environment of fair competition.

Challenges we ran into

Challenges We Ran Into

1. Handling Large-Scale Data

Hurdle: With over 180,000 projects on Devfolio, efficiently managing and comparing this extensive dataset posed a major challenge. Processing such a volume could easily lead to slow performance and high resource usage.
Solution: We leveraged a combination of optimized data structures, efficient database indexing, and preprocessing techniques to handle and streamline the data, enabling faster similarity checks without compromising accuracy.

2. Dynamic Content Parsing and Scraping

Hurdle: Many Devfolio project pages use dynamic content that required us to adapt our scraping techniques for consistency and accuracy. Changing class names and dynamic content required solutions that could adapt without breaking with each page load.
Solution: To overcome this, we implemented a smart scraping approach using robust tools and selector fallback methods, allowing us to reliably access the required data regardless of page structure changes.

3. Textual Crux Extraction and Vectorization

Hurdle: Summarizing each project’s description into a concise "crux" for efficient comparison was challenging, especially with the variation in text length and structure across projects.
Solution: We incorporated an NLP model to automatically generate concise descriptions, capturing each project’s essence. These summaries were then vectorized to enable high-speed similarity calculations, achieving efficient and relevant comparisons.

4. Ensuring Accuracy in Similarity Detection

Hurdle: Balancing accuracy and speed in detecting similarities was complex. Setting thresholds too high or low could lead to either missed detections or excessive false positives.
Solution: We fine-tuned the similarity thresholds based on testing data to ensure accurate results, refining the model’s parameters to strike a balance between precision and recall.

Technologies used

Django

Next.js

scikit-learn

NumPy

Selenium

Python

TypeScript

Natural Language Toolkit (NLTK)

Tailwind CSS

KeyBert

Discussion

Builders also viewed

See more projects on Devfolio