Bluequery
FloatChat is an AI-powered conversational interfac
Created on 19th October 2025
•
Bluequery
FloatChat is an AI-powered conversational interfac
Description of your solution
The ARGO program deploys autonomous profiling floats across the world’s oceans, collecting essential parameters such as temperature, salinity, pressure, and dissolved oxygen, and storing them in NetCDF formats that require domain expertise and specialized tools to process. While this data has immense value for climate science, environmental monitoring, and marine research, its accessibility remains limited to experts with advanced technical skills, creating a barrier for policymakers, educators, interdisciplinary researchers, and the public. FloatChat addresses this gap by providing a natural language-based query and visualization platform, enabling users regardless of technical background to interact intuitively with complex oceanographic data. The system begins by ingesting ARGO NetCDF files and transforming them into structured formats such as SQL and Parquet to allow efficient querying. Metadata and summaries are extracted and stored in a vector database such as FAISS or Chroma, optimized for high-speed retrieval. At the core of FloatChat’s architecture is a Retrieval-Augmented Generation (RAG) pipeline powered by multimodal Large Language Models (LLMs) that interpret user queries, translate them into precise database commands, and retrieve relevant data. The user interacts with a conversational front end, where queries are sent to the backend server via a secure tunnel such as ngrok. Before execution, queries pass through a critical security layer the Prompt Guard Agent which classifies them into SAFE and UNSAFE categories, thereby preventing prompt injection attacks and malicious requests. Only SAFE queries are processed further by the Query Processor Agent, which enters a “thinking mode” to explore the global metadata schema, identify relevant attributes such as PI_NAME, DEPLOYMENT_PLATFORM, and PLATFORM_TYPE, and plan the query execution. The agent constructs optimized SQL queries to retrieve accurate results, leveraging the MCP server to access the database efficiently. Retrieved results are sent to the Formatter Agent, which converts raw data into human-readable insights, adding contextual explanations to ensure comprehension by non-specialists. The formatted output is presented to the user via interactive dashboards developed in frameworks such as Streamlit , which provide features like mapped trajectories of ARGO floats, depth-time plots, comparative analyses, and intuitive visual representations that translate complex data into understandable formats. FloatChat’s design thus transforms the traditionally technical process of oceanographic data analysis into a seamless conversational experience, democratizing access to ocean science. Its architecture integrates multiple layers a conversational UI for natural language input, a robust backend for query interpretation and processing, a structured database optimized for vector search, a security layer for prompt validation, and an interactive visualization layer ensuring scalability, security, and adaptability for future enhancements. The novelty of FloatChat lies in its integration of natural language understanding with domain-specific data retrieval, its innovative Prompt Guard Agent for query safety, and its use of RAG-powered multimodal LLMs to bridge the gap between human intent and structured scientific data. By allowing non-specialists to query ARGO data naturally and receive visually interpretable results, FloatChat enables a broader range of users from climate scientists and researchers to educators, students, policymakers, and the public to explore, analyze, and apply oceanographic knowledge without the steep learning curve traditionally associated with such datasets. This capability has profound implications: it accelerates scientific research by reducing preprocessing time, empowers policy-making through accessible evidence-based insights, and promotes public engagement with ocean science. Furthermore, FloatChat’s framework can be extended to integrate additional oceanographic datasets, live ARGO float feeds, and predictive analytics, making it a scalable model for other domains that involve large-scale, complex datasets. Challenges such as ensuring real-time performance, multilingual query support, and expanding dataset integration remain, but the modular architecture of FloatChat allows for iterative improvements. In essence, FloatChat represents a paradigm shift in oceanographic data exploration, democratizing access to one of the most valuable environmental datasets in existence and fostering interdisciplinary collaboration. By transforming raw ARGO NetCDF files into structured, queryable formats and enabling natural language interaction, FloatChat bridges t
Tracks Applied (1)