Week 3: Improving My AI Model and Understanding Mistakes

Christopher H -

This week, I focused on gathering more data and analyzing past test results to figure out why my AI model made mistakes. When training an AI to detect lung cancer from medical images, it’s important to understand where it gets things wrong so I can improve its accuracy.

One of the key parts of my work this week was reviewing the actual images my model is learning from. These are histopathology slides, microscopic images of lung tissue, that help doctors diagnose cancer. Below is an example of one of the images I’m using from my dataset.

One big challenge in training my model is the size of these images. The image above is 2.19 GB, and these whole-slide images are often massive and very detailed. This makes training more difficult because the AI has to process a huge amount of data at once. These slides need to be broken down into many smaller sections before training, unlike smaller images, which can be quickly analyzed. Otherwise, they would take up too much memory and slow down the process. Finding an efficient way to handle these large files is another challenge I have had to face.

After reviewing the test results, I noticed that some of the errors come from images where healthy and cancerous areas look very similar. The AI has trouble distinguishing between them, which leads to misclassifications. Other mistakes may come from variations in the way the slides are prepared, such as differences in color and contrast. I’ve also found some data from my old dataset that was drawn over, which might have lowered the accuracy of my model, so I’ve chosen to remove them.

Another issue I found is that my dataset isn’t perfectly balanced. There are more types of cancer than others, which could make the AI favor the more common ones. To fix this, I’m finding more images to balance out the data.

Next week, I’ll be making adjustments based on what I’ve learned, improving the way I process the images, tweaking how the AI is trained, and running new tests to see if accuracy improves. Each mistake brings me closer to a better model, and I’m excited to keep refining it!

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    rayan_k
    Hi Chris! Do you think the hardware you're using to train your AI model has something to do with the model's inaccuracy? I know you've already pointed out many things that you will fix, but I just thought of it, especially with the importance of what GPU you use when training AI models.
    Christopher Hameenanttila
    Hi Rayan! Hardware definitely plays a role in how efficiently the AI model trains, especially with large datasets like mine. Right now, I’m using an NVIDIA RTX 4070, which has decent VRAM, but since whole-slide images are so massive, I have to carefully manage memory and batch sizes to avoid running into bottlenecks. While my GPU doesn’t directly cause misclassifications, faster hardware could speed up training and allow for more complex models or larger batch sizes, which might improve accuracy. That said, most of the mistakes I’m seeing seem to come from the dataset itself, so I’m focusing on improving data quality first.

Leave a Reply

Your email address will not be published. Required fields are marked *