Green Reaction predictor

AI Green Reaction Optimizer — revolutionizing chem

Built at Hack On Hills 7.0

Created on 2nd November 2025

•

Green Reaction predictor

AI Green Reaction Optimizer — revolutionizing chem

The problem Green Reaction predictor solves

The AI Green Reaction Optimizer is an intelligent tool designed to assist chemists, researchers, and students in optimizing chemical reactions efficiently and sustainably. By simply entering parameters such as temperature, catalyst concentration, and solvent structure (via SMILES code), users can instantly predict the expected reaction yield through an interactive dashboard. This eliminates the need for repeated trial-and-error experiments, allowing users to explore different reaction conditions digitally before performing them in the lab.

The platform makes the process of reaction optimization significantly easier, safer, and faster. It simplifies complex chemical data analysis and provides visual insights into how changes in temperature, catalysts, and solvents affect reaction efficiency. By reducing the number of physical experiments, it ensures safer laboratory practices, minimizing exposure to hazardous chemicals and preventing waste from failed reactions. Moreover, it drastically accelerates research by using machine learning to simulate thousands of reaction possibilities within minutes, helping scientists identify optimal conditions for maximum yield and minimal environmental impact.

In essence, the AI Green Reaction Optimizer acts as a digital chemist, offering data-driven predictions that save time, reduce costs, and promote eco-friendly, high-yield chemical processes—paving the way for a more sustainable and efficient future in chemical research and industrial applications.

Challenges we ran into

One major challenge faced during the development of the AI Green Reaction Optimizer was a data mismatch issue between the model and the scaler used during prediction. Initially, the model was trained on one version of the dataset while the prediction script used another, leading to a feature count mismatch and errors such as “ValueError: X has 2050 features, but StandardScaler is expecting 130 features as input.” This issue occurred because the preprocessing steps and the number of molecular fingerprint bits were not consistent across the different scripts. To overcome this, we standardized the entire data pipeline to ensure that both the training and prediction stages used the same preprocessing logic. We also ensured that the same scaler file (scaler.pkl) used during training was saved and reloaded during prediction for consistent feature scaling. Additionally, we introduced debugging print statements and shape checks to validate the number of features throughout the workflow. After these fixes, the model and preprocessing became fully synchronized, leading to stable and realistic yield predictions. This experience highlighted the importance of maintaining a unified and reproducible data processing pipeline in machine learning projects, particularly in scientific applications where precision and consistency are critical.

Technologies used

TensorFlow

scikit-learn

NumPy

pandas

Anaconda

Python

Plotly

Discussion

Builders also viewed

See more projects on Devfolio