ritikaagarwal08

Ritika Agarwal

I'm an aspiring entrepreneur who wants to use technology to create widespread impact. The ability of machines to drive potentially life-saving decisions fascinates me. Machine Learning (ML) and Data Science (DS) being at the heart of such decision making drew me into the field and shaped my professional and research experiences.

At Shipsy, I built an Optical Character Recognition (OCR) system for zip code extraction from images of handwritten postal addresses. I collaborated with a diverse team of twenty to annotate ~5k images with bounding box coordinates and digits of zip codes. Using this, I fine-tuned a faster r-CNN inception model pre-trained on ImageNet for zip code detection; and trained an independent convolutional-recurrent neural network from scratch for recognition of the zip code sequence. I then worked with an Android developer to deploy the models on the mobile application. Working on the OCR system helped me understand the real-world challenges associated with an end-to-end ML model development cycle, ranging from data annotation to model deployment.

As Covid-19 progressed and researchers tried modeling its spread, I started working remotely with Dr. Saptarshi Ghosh at IIT Kharagpur to understand its dynamics from Twitter posts. We classified four types of Covid Symptom Reporting (SR) tweets by fine-tuning a multiclass Covid Twitter BERT model and further improved its performance (+3% five-fold average cross-val macro F1 score) by appending ten handcrafted binary features to the encoder output. Observing a high time-lagged correlation between the daily number of SR tweets and the actual number of Covid cases, we trained regression models with these Social Media (SM) signals to predict the number of Covid cases in advance. We later submitted our work at the AAAI Conference on Web and SM '2022.

In my role at American Express (AmEx), I had the opportunity to diversify my ML capabilities and, at the same time, apply them on a much larger scale. I was responsible for maintaining the One-Click Modeling (OCM) pipeline used by the brand and marketing team for model development. My work on improving the pipeline's Bayesian HyperParameter (HP) search process by writing custom Early Stopping and Pruning modules saved ~70% time to find the optimal HPs while achieving benchmark model performances. Using OCM, I also worked with large-scale structured data to build gradient-boosting models and ran multiple iterations on them to get the approval of an internal governance committee. Interfacing with the regulators to make the models compliance-friendly helped me think critically about the issues of fairness and bias. Currently, these models are used for targeting the US population with digital advertisements and drive millions of dollars in revenue annually.

My experience managing OCM came in handy when I took ownership of another critical process, the data pipeline. At that time, its feature engineering step, which entailed extrapolating ~25MN US customers' data to ~250MN US prospects (people who do not own any AmEx card) using KNN with 100 neighbors, was a significant bottleneck towards our campaign deliveries. I optimized this step by identifying that only ~30MN unique neighbourhoods are shared by the prospects and hence eliminating ~75% of the redundant computations happening earlier. Doing so reduced the time taken for the entire process from ~24 hours to ~3 hours and sped campaign turnarounds by at least seven days. Implementing a computationally complex idea employing big data technologies such as Hadoop Distributed File System and Python Map-Reduce framework increased my technical versatility.

Projects

ContentMate

Simplifying Content Creation For 15MN YoutubersFlask, Git, cors, langchain, Ai Agents, Google Developer APIs

Skills

Python
JavaScript