Week 4-A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature
Adam B -
Hello everyone! After last week’s focus on refining instructions and leaning more on Grok 3 for its nuanced outputs, I’ve spent these past few days working through the remaining papers for the testing phase of my AI model.
Our testing set consists of a variety of psychiatric papers, assessing everything from rat models to human cadaver brains. With this wide variety of papers comes the difficulty of making an instruction set that can process all of these papers. Over the course of these few weeks, I have been tweaking the prompt to have that balance between accuracy and adaptability, and it appears to be relatively consistent on these final training sets. Naturally, there is more testing to come, including a validation phase and a testing phase; I will be describing those in greater depth once they arrive.
In terms of what kind of tweaking, I refined the way my data is extracted as CSVs to remove the commas in inopportune places that would shift my data into the incorrect spaces. I also added a clause for repeat confirmation of the accuracy of extractions and a clause to better express additional info to separate semantic differences between similar genes.
Beyond my personal testing, I am also stepping into the world of automation to really test how effective this model can be. Right now, I’m manually feeding it PDFs, which appears to be a bottleneck; this week, I attempted to code an API to be able to analyze papers in bulk. This process is difficult, but fortunately, AI is also extremely good at coding.
Thanks for reading,
Adam
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.