Rare Disease Datasets (continued)
Heet D -
For this blog post, I thought I would take a moment to explain how my project has changed, and what is next.
With my project I hope to advance and contribute to research in using zero-shot and few-shot learning for rare disease diagnosis. In recent times this has been a growing area of research and it has a lot of potential for its diagnosis capabilities. My project requires establishing a SOTA baseline for classification of images based on the NIH Chest X-ray dataset using the model CLIP. So far I have been able to do this for both zero-shot and few-shot. By this I mean I was able to outperform the SOTA results cited in literature with mine. This has allowed me to have a good baseline for common diseases with which I can compare my rare disease results.
Earlier, in the blog post where I presented results, I had also tested a fine-tuned case where I trained CLIP on the entire dataset as opposed to zero-shot and few-shot learning. Initially I had kept this as an entirely different test case, but as I looked at more and more literature on this topic, I realized that I could significantly improve my results if I implemented what is called domain adaptation by combining aspects of fine-tuning with zero-shot and few-shot. Domain adaptation is essentially just preparing CLIP for zero-shot and few-shot without specifically training the model for it. This might sound unclear at the moment, but it will make a lot more sense in my next post with my results.
Overall, I am ready for the rare disease part of my project. As mentioned in my last blog post, I was searching for a dataset. Over the past week I have been considering many options, and have decided to filter a larger dataset to get these images. I was initially considering using a synthetic dataset, but since it won’t have a ground truth for evaluation it is not a good choice for my needs. I am currently looking at the CheXpert Datsaet by Stanford, and will likely proceed with this. Choosing the dataset is one of the most important parts of my project, which is why I decided to take some extra time to ensure that my dataset is suitable. I spent this week modifying my code for domain adaptation along with finalizing my dataset.
This coming week I will focus on pre-processing and training and shortly I should have results to share with you all, along with the explanation for domain adaptation. Thank you!

Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.