Week 3-A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature

Adam B -

Hello everyone! This past week brought me deeper into both model comparisons and streamlining workflows for data extraction.

 

This week, I used two different AI models: ChatGPT o3, and a second, newer, and less well-known AI, Grok 3. Grok is an interesting AI made by X because it was trained on typical public documents as well as synthetic datasets. When I first incorporated it into this project, I was initially skeptical, but it appears to be remarkably effective (even more so than GPT at times).

 

My main determination of how accurate my models were was simply how accurate and clear each model was in representing its data. Interestingly, while both models handled the task relatively well, the “o3” model offered more consistent formatting and more precise designations of gene relationships, while Grok 3 was capable of understanding complex nuances in the paper and had longer, more thorough (and albeit more complicated) outputs.

 

I mentioned last week that small changes to the prompt can have significant impacts on the output. This week, one of my major edits was just that: a simple two-line edit to my instruction set emphasizing the need to dichotomize the options for genes to only be upregulated or downregulated. Much of my editing also came in the form of removing redundancy in my instructions to divert the AI’s processing toward the actual paper analysis.

 

Another update: I (well, the developers of ChatGPT and Grok) have developed a solution to the PDF unreadability problem I mentioned in my last post. Both the AIs can now simply read Word documents, bypassing the PDFs entirely. I still have no idea why my solution in the last post worked.

 

Looking ahead, I intend to continue refining the instructions and rely more on Grok for upcoming papers. My hope is that these incremental changes, along with better formatting, will improve extraction reliability without sacrificing clarity or consistency. Thanks for following along.

 

Adam

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    rohan_va
    Hi Adam, the level of detail these AI models provide is fascinating! In the future, when this trained AI becomes widely accessible, do you anticipate users having to make small changes, or 're-train' the AI before their specific use? Or, would the product be able to adapt automatically to the different goals that users have?
    adam_b
    Hi Rohan, that's a great question! Since we are training our AI model on a wide variety of papers, it should be able to adapt automatically according to the specifics of the user's instructions.
    marcos_v
    Hi Adam, this project is very interesting? I'm curious, for the finalized product, would you fully stick to one model, GPT or Grok, or do you intend to incorporate both of them? At what point would you implement this change?
    adam_b
    Hi Marcos, That's a fantastic question! The finalized product is the prompt, sort of like a "brain" you could put into any model. It will be optimized for Grok because it is most effective at the moment, so I suppose you could say that change happened now.

Leave a Reply

Your email address will not be published. Required fields are marked *