Week 1—A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature
Adam B -
Training the AI Model: How the Analysis Works
Hi everyone, it’s Adam! In my previous discussion, I explained the use of AI in psychiatry and its potential to extract meaningful biological markers from research. Now, I’ll explain how the process of analysis works to train the model to accomplish just that.
Data Input and Instruction Refinement
The process begins by providing the AI model with full-text research papers, primarily from publicly available psychiatric studies. (From the last post: these papers contain data on how different genes and proteins change in response to a disease or treatment. The AI’s task is to identify statistically significant relationships between these genes or proteins and diseases or treatments.) The model is then given an initial set of instructions, a draft that describes the general thought process a human may go through to determine what qualifies as a significant genetic relationship with a disease. With the paper and the instructions, the AI will generate an output of what it determined to be significant relationships.
Supervised Learning and Optimization
Once the AI generates outputs, its findings are compared to reviews and extractions by human readers of the same paper. Discrepancies are analyzed by finding which parts of the study were missed by the AI (or even by the human analysts!) and are then used to refine the instructions. (For example, suppose the AI misses a whole group of genes because it took the wrong number as a p-value significance threshold. We would alter the instructions to explain that the p-value threshold should always match the significance method outlined by the paper or a p-value of 0.05, whichever is more restrictive.)
A general rule of thumb is that broad guidelines lead to excessive irrelevant data, while overly narrow ones risk missing key findings, especially due to the variety of papers that psychiatric literature may deal with. Balancing specificity and flexibility is essential for accurate data extraction.
Over time, repeated refinements enhance the AI’s ability to recognize patterns, but of course, there are obstacles to overcome (for example, computational limitations of current-generation GPTs that all may affect its accuracy). In the next post, I’ll elaborate on these issues and what solutions have been proposed for them.
Thank you!
Adam
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.