Heet D's Senior Project Blog
|
|
Project Title: Zero-Shot and Few-Shot Learning for Rare Disease Diagnosis BASIS Advisor: Ms. Briggs Internship Location: ASU/Virtual Onsite Mentor: Professor Jianming Liang. |
Project Abstract
In a small clinic, a mother clutches her child’s hand as another round of inconclusive test results comes in. For years, she’s searched for answers to a rare disease no one seems to recognize. Doctors don’t know how to treat it and there is little to no data on past cases. I believe emerging techniques like zero-shot and few-shot learning are poised to change stories like hers, offering faster and more accurate diagnoses where traditional methods fail. A rare disease is defined as a condition that affects less than 200,000 people. But despite each disease being uncommon individually, in total they affect about 300 million people worldwide. This low individual prevalence combined with the high collective impact poses multiple challenges for both doctors and patients. My research evaluates zero-shot and few-shot learning as a potential solution. These machine learning techniques allow AI models to identify diseases with little to no training data, which is necessary for rare diseases. My research will evaluate the performance of models like GPT-4, CLIP, and BioBERT. I will use publicly available image datasets, like Chest X-ray images and Skin Cancer MNIST which are available on Kaggle, along with textual datasets. The models will be tested in both zero-shot and few-shot learning settings, and the performance will be assessed based on metrics like accuracy, bias, and generalizability. The results will provide insight into the effectiveness of each model, technique, and data type, highlighting the potential of zero-shot and few-shot learning to be used as real-world tools for supporting healthcare professionals in diagnosing rare diseases more accurately—and ultimately, saving lives.
Conclusion
As part of my final blog post, I would first like to thank Professor Liang and Ms. Briggs for all the help and support they have provided along the way. I would also like to thank Ms. Conley, and Mrs. Kakkar for this opportunity. I feel like I have a completely different understanding of machine... Read More
The End
In this week’s post I will be going over my final part of data collection and going over my project overall. As mentioned in the last blog post I had prepared and set up the dataset for a few-shot evaluation. Likewise I was also prepared for zero-shot evaluation based on the diseases I had picked.... Read More
Data pre-processing Part 2
For this week's blog post I began pre-processing and working with the dataset. The MIMIC dataset is extremely large, and since I am running a few-shot evaluation, I don't necessarily need to download the entire dataset. To work with the dataset I began by filtering the dataset for rare diseases. Specifically I used the following:... Read More
Finalized Dataset
In this post, I will explain domain adaptation and also describe my progress for the week. So as promised, domain adaptation refers to taking a model trained on one domain (the source domain) and adapting that to perform well on a different domain (the target domain). You can think of this as teaching a chef... Read More
Rare Disease Datasets (continued)
For this blog post, I thought I would take a moment to explain how my project has changed, and what is next. With my project I hope to advance and contribute to research in using zero-shot and few-shot learning for rare disease diagnosis. In recent times this has been a growing area of research and... Read More
Rare Disease Datasets
This week I managed to finalize my results and have started looking for a data source I can use for testing on rare diseases. This search has been way more challenging than I expected. Due to the nature of rare diseases, which have a relatively small number of cases that are diagnosed, it is simply... Read More
Data Collection (Still Continued)
This past week I was working on improving my code to achieve results that are comparable with SOTA results. While doing so I ran into one significant problem. When trying to achieve a high level of performance, it is recommended to follow the official data split when training the model. This split refers to how... Read More
Data Collection (Continued)
Last week I believed I had the first part of my data collection done, but after checking in with Professor Liang, I realized that I had not established SOTA baselines correctly for my research. Let's look at what SOTA baselines actually are, and how to implement them. A state-of-the-art baseline or SOTA baseline, is the... Read More
Data Collection
This past week, I began my data collection using my pre-processed dataset with the CLIP model. My project aims to evaluate performance for rare disease diagnosis, so I first need to test on common diseases to establish a solid baseline for comparison. For this, I evaluated 3 different techniques: zero-shot learning, few-shot learning, and supervised... Read More
Data pre-processing
This past week, I worked on the data pre-processing stage of my project. This is a critical step before training models. As I mentioned in my last post, I will be using the NIH Chest X-ray dataset from Kaggle, which contains over 112,000 X-ray images. This makes it essential to prepare the images so they... Read More
Common Disease Datasets
This past week, I began the data collection part of my project. I originally planned to use the MIMIC dataset for my project. This dataset comprises of real-world patient health data for over 40,000 patients who stayed in the critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. To ensure... Read More
Introduction
Hello everyone! My name is Heet Dalsania, and my project evaluates the effectiveness of zero-shot and few-shot learning in rare disease diagnosis. Rare diseases are often significantly harder to diagnose than common diseases because of various issues. For example, a common issue is a lack of training data which causes traditional models and methods to... Read More
