Quantiative Healthcare Investing

Harnessing Data Science and Quantitative Finance Techniques for Impactful Healthcare Investing

111

Built at Hacklytics 2024

Best Finance Hack

Created on 11th February 2024

•

Quantiative Healthcare Investing

Harnessing Data Science and Quantitative Finance Techniques for Impactful Healthcare Investing

The problem Quantiative Healthcare Investing solves

Eroom's Law

Throughout the end of the 1900s and into the 21st century, modern medicine has rapidly progressed with constant innovation in the research of new drugs. However, it has become increasingly difficult to develop novel effective drugs, driving costs of research & development in healthcare up as time passes. In fact, the number of drugs produced per dollar spent has been decreasing exponentially, sometimes deemed Eroom's Law: the inverse of Moore's Law!

The Application of Quantiative Investing

Today, only around 5% of clinical trials reach Phase-4 approval for public use, making investing in them like betting $200 million on a home run--but you can only bet on one bat! As a result, funding for clinical trials has become a risky prospect for any potential investor, but what if we can utilize the power of quantitative investing to make these investments risk-averse?

Challenges we ran into

One challenge we ran into was acquiring a clean, sufficiently large dataset. Many datasets used by research papers in the same domain used private databases, so it was difficult to track down a dataset with features that were consistently documented.

We utilized the ClinicalTrials.gov database to query clinical trials involving cancer–an arbitrary decision we made to acquire a reasonably sized dataset. The dataset provided by ClinicalTrials has many features to train, but some like Date/Time format were dirty and needed to be cleaned.

Many fields were also qualitative, such as the descriptions of the trials, so we decided to generate new quantitative features with transformers to extract vector representations of sentences. We used the HuggingFace sentence_transformers API to apply trained models, but because of the large size of the dataset (approximately 100,000 rows x 30 columns), we submitted SLURM jobs on the Gatech PACE-ICE cluster to generate our cleaned dataset for training.

Once we obtained our expected success rates and returns for the various clinical trials, we were able to proceed with portfolio optimization. We implemented a Mean-Variance Optimization model to determine the optimal allocation of weights between the clinical studies. We constrained the optimization to maximize expected value given a bound for risk tolerance, which yields the Efficient Frontier curve that describes the maximum expected return vs risk.

One challenge we faced here was our tendency for the model to highly weigh a few “great” clinical trials, which made our portfolio uniform instead of diverse. To solve this, we added an L2 norm constraint to encourage smaller weights, pushing our model to diversify as a means to decrease risk.

Finally, we displayed the results in an interactive Dashboard that displays a Pie Chart and Efficiency Frontier graph that reflects the optimal portfolio construction based on a user input risk tolerance slider.

Tracks Applied (2)

Health

We wanted to create a project that would have a realistic impact on healthcare. Clinical trials are constantly in need o...Read More

Finance

We are applying conventional finance (Modern Portfolio Theory, Mean-Variance Optimization, etc.) and quantitative financ...Read More

Technologies used

PyTorch

scikit-learn

pandas

Python

Dash

Plotly

TRANSFORMERS

HuggingFace

Slurm

Discussion

Builders also viewed

See more projects on Devfolio