Pulmo Carcinalyzer
Time is priceless, Don't let cancer steal it away.
Created on 2nd October 2024
•
Pulmo Carcinalyzer
Time is priceless, Don't let cancer steal it away.
Describe your project
This project develops an AI-powered solution for early lung cancer detection. It uses a combination of genomic data analysis and medical image processing to assess an individual's risk of developing lung cancer. The application takes patient information, including symptoms, family history, and genomic test results, and analyzes it using Google's Gemini advanced language model. Additionally, it allows users to upload PET/CT scan images, which are processed by a TensorFlow model
However, in the first part of the problem, where we are estimating the patients considered to be 'high-risk', Not a lot of patients will have records for gene mutation tests and tumor markers since these tests are done very rarely. But also, making use of the variable of the Smoking Habit of the patient to increase the accuracy a bit because 80-85% of Lung cancers(Avg. in Both Males and Females) in patients originate because of their smoking habit or exposure to passive smoking.
With more genomic data we can fine tune our GEN AI model to provide outcomes with less variance and bias and more accurate predictions by finding and analyzing trends.
Challenges we ran into
One challenge was extracting the risk level and instructions from the Gemini model's response. The initial output format varied, making it difficult to parse the information accurately. We overcame this by refining the prompts to enforce a consistent response structure and adjusting the parsing logic in our backend code to match.
We faced many bugs for the UI so we had to finally resort to making breaking changes that distorted the web app's UI while working on any other device except a desktop. We rather focused our attention more on the functionality of the application.
The biggest hurdle we faced was regarding collecting data to train our model, eventually with extensive research and some time and effort, we found some open-source data from GCD, WCRF, and some from Kaggle.
Technologies used
