Week 6-A Supervised Approach To Training An Artificial Intelligence To Extract Relevant Genomic Data From Literature

Adam B -

Hello,
This past week, I continued working on the API for automating the AI extraction process. This process involved many bouts of testing, refining, and retesting. So, this week’s blog will highlight some of the more notable challenges faced on improving the API.

“Not-Specified”
Since I am no longer using a conventional AI interface (the usual chatbot experience) and instead a program ordered around by python code, my previously tested prompt had to be modified slightly to be accepted neatly into this code. One of the more frustrating artifacts of this transfer was the occurrence of genes being labeled as “not specified” or “N/A.” yet still having extracted upregulations or downregulations. Admittedly, the question itself I found rather bizarre: figuring out what made the AI determine there was a gene to output without knowing what gene it was outputting. One of the most important checks involved confirming that the AI is not simply hallucinating regulation directions for the genes and proteins. Fortunately, the solution was a rather simple filtering step that removes these placeholders and requests a revision to ensure that no extraneous entries remain and that the extracted outputs were intended.

Outputs
Perhaps the most meaningful part of this API is the output. This week required a lot of tweaking to the outputs, ensuring that data 1) ends up in the right place 2) says all the correct things it should say and 3) doesn’t say anything it should not. Ending up in the right place involves an assessment process by the AI to determine what kind of paper it is reading (Is one about human blood and certain medication or rat brain dissections?). Having a correct output is a process of trial and error, seeing what data is missing, correcting the prompt to adjust this, and repeat.

In the coming days, I intend to refine the final output format so that it lines up perfectly with previous data. The next step after that is a deeper validation test to confirm that the newly filtered results are as accurate as they seem.

Thank you,

Adam

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    tanay_n
    This is very intriguing Adam! In refining the output format, are you aligning it with a predefined schema or adapting it dynamically based on new findings?
    adam_b
    Hi Tanay, great question! There is a particular framework we use for structuring outputs in general that is specifically designed to make it easier for another program to process them and identify relationships. That framework serves as our anchor when refining output format for the AI.

Leave a Reply

Your email address will not be published. Required fields are marked *