Currently pursuing a Bachelor's in Computer Science Engineering and specialization in Data Science. Possessing a unique combination of technical and business acumen with proficiency in Python, Pyspark, Cassandra, SQL, Tableau, and Microsoft Office suite.
During my internship at Kotak Securities' Institutional Equities division, I am currently working on Python-supported stock screener development and PySpark-based trading algorithm implementation. My skills in PySpark Streaming facilitate real-time data analysis, while proficiency in Cassandra ensures efficient management of extensive data repositories. I possess a comprehensive understanding of derivatives trading strategies and their practical application in complex investment landscapes.
Interned at a multi-million dollar hedge fund with a proven ability to prepare models, forecast trends, give presentations and perform fundamental and quantitative analysis of businesses. Detailed in data analysis and driven by a fascination with tech-enabled businesses and a desire to make an impact in the fields of investment, banking, and finance.
A quick learner with excellent leadership, time management, communication, and analytical skills, always seeking opportunities for growth and development while leveraging my knowledge and skills to help organizations make data-driven decisions.
Project: kotakGPT (RAG: Multi-File Local Querying System)
• Led RAG development, integrating LangChain's document loaders, embeddings, and text splitting libraries. Integrated Streamlit for UI enabling GPT model creation, updates, and queries.
• Implemented multiprocessing for faster document loading, chunking, in FAISS vector stores. Explored Chroma Vector store, identifying CPU limitations.
• Orchestrated model optimization, balancing quantized models to optimize CPU usage without compromising accuracy.
• Engineered CSV data ingestion with custom Python scripts, converting it to narrative-based PDFs for enhanced querying, despite suboptimal outcomes.
Project: Stock Screener Backend Enhancement
• Led Python support expansion for enhanced backend development, achieving 6-minute to 15-second execution time reduction.
• Architected structured code for over 80 complex financial metrics, seamless data retrieval from, data updates within, MSSQL databases.
• Employed strategic vectorization methods and fine-tuned SQL queries for resource-intensive operations.
Project: Data Wrangling and Process Automation
• Leveraged Python skills to manipulate extensive datasets, wrangling data and tailoring it to specific requirements.
• Promptly resolved issues accessing faulty Excel files by identifying problems through XML inspection, then extracting data via Excel-to-XML-to-TXT conversion.
• Implemented SMTP automation for streamlined email dispatch of critical business reports.
Project: Big Data Manipulation and PySpark Innovation
• Spearheaded significant Big Data manipulation and PySpark experimentation project.
• Demonstrated proficiency in Cassandra for efficient data management, successfully configuring PySpark on Windows.
• Strategically utilized PySpark's diverse modules, conducted performance analysis, and explored "lazy" approach.
• Executed complex data manipulation tasks for Excel files with over 7 million rows.