Describe your project
In-Scope:
- Product Catalogue: Product data is scraped from various websites like Amazon, Blinkit, BigBasket etc. and aggregated in a Firebase datastore. This is done through an automated workflow pipeline that runs at regular intervals.
- Health Analysis bot: Consumer can upload images of packaged food product label and get a detailed health analysis, including but not limited to compatibility with dietary restrictions, alignment with ICMR guidelines, assessment of processed or harmful ingredients.
Out-of-Scope:
- Multi-Lingual Support: While we recognize that availability of multi-lingual support is crucial to make this initiative accessible to the masses of India, we choose to designate this burden to other devs.
- Customized diet planning features
Future-Scope:
We wanted to accomplish a lot more things that we were unable to do so due to time constraints.
- Allow fuzzy matching with products already stored in the database.
- More detailed analysis using RAG based pipeline.
- Availability of application through a Google-Lens like User Interface.
Challenges we ran into
Data Challenges
- Amazon, Big Basket had a major issue with information. For many products, the information was incomplete and images weren't clear to extract any data from them. After a lot of research, we found that blinkit provides clear images for ingredients and nutrition tables for all products
- There's no API and thus scraping had to be done which comes with its own challenges
- Automatically fetching the "new products" is a challenge and the scraper does it by checking for new links on the website for top categories.
Health Analysis from Images Challenges
- Setting up billing for the Google Cloud account for accessing Google Cloud document AI
- OCR wasn't effective since the images are usually not up to the mark for OCR processing and reading nutrition values and ingredients for further health analysis has to be very accurate. Thus we had to look for more complex solutions