Week 3-A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature
Adam B -
Hello everyone! This past week brought me deeper into both model comparisons and streamlining workflows for data extraction.
This week, I used two different AI models: ChatGPT o3, and a second, newer, and less well-known AI, Grok 3. Grok is an interesting AI made by X because it was trained on typical public documents as well as synthetic datasets. When I first incorporated it into this project, I was initially skeptical, but it appears to be remarkably effective (even more so than GPT at times).
My main determination of how accurate my models were was simply how accurate and clear each model was in representing its data. Interestingly, while both models handled the task relatively well, the “o3” model offered more consistent formatting and more precise designations of gene relationships, while Grok 3 was capable of understanding complex nuances in the paper and had longer, more thorough (and albeit more complicated) outputs.
I mentioned last week that small changes to the prompt can have significant impacts on the output. This week, one of my major edits was just that: a simple two-line edit to my instruction set emphasizing the need to dichotomize the options for genes to only be upregulated or downregulated. Much of my editing also came in the form of removing redundancy in my instructions to divert the AI’s processing toward the actual paper analysis.
Another update: I (well, the developers of ChatGPT and Grok) have developed a solution to the PDF unreadability problem I mentioned in my last post. Both the AIs can now simply read Word documents, bypassing the PDFs entirely. I still have no idea why my solution in the last post worked.
Looking ahead, I intend to continue refining the instructions and rely more on Grok for upcoming papers. My hope is that these incremental changes, along with better formatting, will improve extraction reliability without sacrificing clarity or consistency. Thanks for following along.
Adam
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.