Week 4-A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature

Adam B - February 28, 2025 6:46 pm

Hello everyone! After last week’s focus on refining instructions and leaning more on Grok 3 for its nuanced outputs, I’ve spent these past few days working through the remaining papers for the testing phase of my AI model.

Our testing set consists of a variety of psychiatric papers, assessing everything from rat models to human cadaver brains. With this wide variety of papers comes the difficulty of making an instruction set that can process all of these papers. Over the course of these few weeks, I have been tweaking the prompt to have that balance between accuracy and adaptability, and it appears to be relatively consistent on these final training sets. Naturally, there is more testing to come, including a validation phase and a testing phase; I will be describing those in greater depth once they arrive.

In terms of what kind of tweaking, I refined the way my data is extracted as CSVs to remove the commas in inopportune places that would shift my data into the incorrect spaces. I also added a clause for repeat confirmation of the accuracy of extractions and a clause to better express additional info to separate semantic differences between similar genes.

Beyond my personal testing, I am also stepping into the world of automation to really test how effective this model can be. Right now, I’m manually feeding it PDFs, which appears to be a bottleneck; this week, I attempted to code an API to be able to analyze papers in bulk. This process is difficult, but fortunately, AI is also extremely good at coding.

Thanks for reading,

Adam

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

camille_bennett

Hi Adam, sounds very interesting. How does removing the commas affect your output?

March 4, 2025 at 12:42 pm - Reply

Alana Rothschild

Hi Adam. It sounds like you have made a ton of progress in your project! I can't wait to hear how the "real" testing goes. Way to go!

March 5, 2025 at 9:21 am - Reply

adam_b

Hi Ms. Bennett, thats a good question! Simply removing extraneous commas helps with the finalization of the data when converting outputted CSV files into clean spreadsheets. It's a minor but impactful fix because it ensures all data is placed in the appropriate columns, something especially important when many of the columns contain similar kinds of data.

March 6, 2025 at 10:54 am - Reply

Week 4-A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature

More Posts

Comments:

Leave a Reply Cancel reply