SQL-GPT
Chat with your database
Created on 19th August 2025
•
SQL-GPT
Chat with your database
The problem SQL-GPT solves
Brief
In fast moving businesses, waiting on data kills momentum. Business leaders, product managers, marketing teams constantly need access to data to make time critical decisions. However this data is often locked away in databases, accessible only to analysts fluent with SQL. The typical workflow - submitting a data request and waiting for a query to be written, creates frustrating delays.
This product - SQL-GPT is designed to solve this exact problem. It provides an intuitive conversational interface for users to interact with their database in plain English, effectively bridging the gap between business and data driven answers.
This web application is built with Streamlit and powered by the Langchain framework, that acts as an intelligent on demand data analyst. The tool empowers business users to self serve their data needs and frees up analysts to focus on more complex, high impact work - like predictive modelling or deep dive investigations.
How it works
The central system is powered by the create_sql_agent, that uses a powerful and fast model - llama-3.3-70b-versatile, via the Groq API. The agent is primed with a deeper understanding of the database, by feeding a schema summary and data context (for instance-possible values in a categorical column).
The workflow:
-
Initialization : User provides API key and database. A default key is available, though heavy usage may exhaust the credits. In case user does not want to supply a database, there's an option to load a default database. The database must be in .sqlite format.
-
User query : The user can ask questions in English, for example "Split number of impressions by gender" or "Which country has the highest sales?", depending on the data being queries.
-
Display : The application displays the final answer, almost instantaneously, and makes the agent's entire thought process available for review.
Challenges I ran into
- Challenge: Overcoming Schema Blindness for Query Accuracy
Problem: My initial attempts involved letting the agent interact with the database without any preliminary context. The results were highly unreliable. The agent was essentially hallucinating the table and column names and misunderstanding table relationships, leading to syntactically incorrect or logically flawed queries.
Solution: To solve this, I developed the get_schema_summary function.This function uses the LLM itself to first perform a meta-analysis of the entire database schema. It generates a concise, natural-language summary that explains the purpose of each table and the relationships between them. Providing this summary as part of the agent's core prompt gave it the foundational knowledge it needed to construct accurate and relevant queries from the very first turn.
- Challenge: Grounding the Agent in Real-World Data for Aggregations
Problem: Even with the schema summary, the agent struggled with queries involving specific categorical values. For example, my dataset had an engagement type column with values like "Likes", "Views", and "Impressions". When I asked the agent to "group the likes by gender," it would generate a syntactically correct query (... WHERE engagement_type = 'likes') but would fail to get results because it didn't know the exact, case-sensitive value was "Likes". This created a critical "semantic gap" between the schema and the data itself.
Solution: I engineered the create_dynamic_cheat_sheet function to bridge this gap. This function intelligently scans the database for text-based columns with low cardinality (a small number of unique values). It then extracts these unique values and compiles them into a "cheat sheet" for the agent. By including this context in the prompt (e.g., Column 'engagement_type' has values: Likes, Views, Impressions), the agent is "grounded" in the actual data, ensuring it uses the correct values in its WHERE clauses and drastically improving the reliability of filtering and aggregation queries.
Tracks Applied (1)
All Registered Participants
Windsurf
Technologies used