Aryan_Nayak

Aryan Nayak

Currently pursuing a Bachelor's in Computer Science Engineering and specialization in Data Science. Possessing a unique combination of technical and business acumen with proficiency in Python, Pyspark, Cassandra, SQL, Tableau, and Microsoft Office suite.

During my internship at Kotak Securities' Institutional Equities division, I am currently working on Python-supported stock screener development and PySpark-based trading algorithm implementation. My skills in PySpark Streaming facilitate real-time data analysis, while proficiency in Cassandra ensures efficient management of extensive data repositories. I possess a comprehensive understanding of derivatives trading strategies and their practical application in complex investment landscapes.

Interned at a multi-million dollar hedge fund with a proven ability to prepare models, forecast trends, give presentations and perform fundamental and quantitative analysis of businesses. Detailed in data analysis and driven by a fascination with tech-enabled businesses and a desire to make an impact in the fields of investment, banking, and finance.

A quick learner with excellent leadership, time management, communication, and analytical skills, always seeking opportunities for growth and development while leveraging my knowledge and skills to help organizations make data-driven decisions.

Projects

NewsPlot Pro

A robust system for real-time event extraction, categorization, and the creation of a dynamic network graph using news data sourced from public APIs, including Airrchip.spaCy, Python, Natural language processing (NLP), NetworkX, Plotly, newsapi, Streamlit, LLM, gemini, Similarity Search

FINITY

Financial UnityReact, MySQL, Python, CockRoachDB, yfinance, MSSQL, RAG, quantstats

Skills

Machine Learning
Data Science
Quantitative Finance
Big Data
Interpersonal Communication

Experience

  • Kotak Securities - Python Development Intern
    March 2023 - Present

    Project: kotakGPT (RAG: Multi-File Local Querying System)

    • Led RAG development, integrating LangChain's document loaders, embeddings, and text splitting libraries. Integrated Streamlit for UI enabling GPT model creation, updates, and queries.
    • Implemented multiprocessing for faster document loading, chunking, in FAISS vector stores. Explored Chroma Vector store, identifying CPU limitations.
    • Orchestrated model optimization, balancing quantized models to optimize CPU usage without compromising accuracy.
    • Engineered CSV data ingestion with custom Python scripts, converting it to narrative-based PDFs for enhanced querying, despite suboptimal outcomes.

    Project: Stock Screener Backend Enhancement

    • Led Python support expansion for enhanced backend development, achieving 6-minute to 15-second execution time reduction.
    • Architected structured code for over 80 complex financial metrics, seamless data retrieval from, data updates within, MSSQL databases.
    • Employed strategic vectorization methods and fine-tuned SQL queries for resource-intensive operations.

    Project: Data Wrangling and Process Automation

    • Leveraged Python skills to manipulate extensive datasets, wrangling data and tailoring it to specific requirements.
    • Promptly resolved issues accessing faulty Excel files by identifying problems through XML inspection, then extracting data via Excel-to-XML-to-TXT conversion.
    • Implemented SMTP automation for streamlined email dispatch of critical business reports.

    Project: Big Data Manipulation and PySpark Innovation

    • Spearheaded significant Big Data manipulation and PySpark experimentation project.
    • Demonstrated proficiency in Cassandra for efficient data management, successfully configuring PySpark on Windows.
    • Strategically utilized PySpark's diverse modules, conducted performance analysis, and explored "lazy" approach.
    • Executed complex data manipulation tasks for Excel files with over 7 million rows.