Managing scrap materials and finding creative, sustainable uses for them can be difficult for individuals and organizations. While many people wish to repurpose waste into valuable items, they often lack inspiration or the technical know-how to start.
This project streamlines the upcycling process by enabling users to upload images of scrap items they have. By leveraging an advanced image-to-text pipeline, it identifies the objects in the image and generates DIY project ideas. Using Gemini APIs and an automated creativity model, the project provides clear, step-by-step instructions and even visual suggestions for repurposing the scrap materials.
Benefits:
Inspiration and Guidance: Offers practical DIY ideas tailored to users’ available materials.
Eco-Friendly Solutions: Promotes upcycling and reduces waste, encouraging sustainable practices.
Accessibility: Makes it easier for users without crafting experience to create valuable items from scraps.
Visual Support: Provides multiple images of potential finished products, helping users visualize their projects.
Time-Saving: Automates the brainstorming and research process for upcycling projects.
This tool empowers users to turn scrap into creative, functional, and eco-friendly items, enhancing sustainability efforts and inspiring new ideas with ease.
Challenges We Ran Into
One of the major challenges we faced was building a seamless pipeline to handle multiple processes involving advanced AI models and Gemini APIs. Specifically, integrating object detection, generating text from images, and creating visual outputs required a coordinated flow across several components. Here are the key hurdles and how we overcame them:
Object Detection from Images: Developing an accurate object detection system was a challenge due to variability in the quality and type of scrap images uploaded. To address this, we experimented with different pre-trained models and fine-tuned them with a curated dataset to improve accuracy and reliability.
Data Flow to Gemini API: Passing data efficiently between the image-to-text and text-to-text components of the Gemini API was complex. The main challenge was ensuring that the output of the image-to-text stage was formatted correctly and enriched to provide contextually relevant inputs for the text-to-text stage. We implemented middleware logic to validate and preprocess the data, creating a seamless connection between these two stages.
Generating DIY Project Outputs: The text-to-text generation of DIY projects, including steps and materials, needed refinement to ensure it provided practical and comprehensive results. This required iterative testing and feedback loops to fine-tune the prompts sent to the API, improving the relevance and usability of the generated instructions.
Visual Output Creation with Flux: Integrating the final text output with Flux to generate images was challenging due to alignment issues and variability in the generated visuals. We solved this by refining the text descriptors and adding metadata adjustments to ensure consistent and meaningful image outputs.
Tracks Applied (4)
Major League Hacking
Major League Hacking
Google For Developers
Technologies used
Discussion