Week 10 -A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature
Adam B -
Hello,
This past week, I ran my AI model through the testing phase: processing a bulk set of ten research papers to gauge how reliably it could extract and summarize key experimental details. While it was not as effective as I had hoped, I have ideas on how to continue improving the model. This week, I will catalog some of the errors I faced and what I did to fix them.
The Errors
First, the AI repeatedly misidentified article titles. This might sound minor, but correct titling is critical for referencing and attributing results accurately. Second, it occasionally misread the condition of the experimental group, especially in studies where multiple experimental groups received either variable amounts of the same treatment or variable treatments as a whole, leading to confusion over which subset of data belonged to which treatment or control. Finally, there was a recurring issue with the direction of gene regulation; correct directionality is critically important to determine meaningful changes in biological markers.
Solutions:
In response, I updated the model used in my API with more precise instructions, emphasizing how to parse the structure of each paper and how to correctly interpret language around gene expression. I also focused on making my instructions more explicit about group identities, which I hope will eliminate confusion over experimental design. Beyond these updates, I returned to verifying output quality by re-reading the papers it extracted from and cross-checking the AI’s results one-by-one
Overall, this experience has reinforced the importance of thorough testing.
Thank you for following along.
Adam
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.