Week 1—A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature

Adam B -

Training the AI Model: How the Analysis Works

Hi everyone, it’s Adam! In my previous discussion, I explained the use of AI in psychiatry and its potential to extract meaningful biological markers from research. Now, I’ll explain how the process of analysis works to train the model to accomplish just that.

Data Input and Instruction Refinement

The process begins by providing the AI model with full-text research papers, primarily from publicly available psychiatric studies. (From the last post: these papers contain data on how different genes and proteins change in response to a disease or treatment. The AI’s task is to identify statistically significant relationships between these genes or proteins and diseases or treatments.) The model is then given an initial set of instructions, a draft that describes the general thought process a human may go through to determine what qualifies as a significant genetic relationship with a disease. With the paper and the instructions, the AI will generate an output of what it determined to be significant relationships.

Supervised Learning and Optimization

Once the AI generates outputs, its findings are compared to reviews and extractions by human readers of the same paper. Discrepancies are analyzed by finding which parts of the study were missed by the AI (or even by the human analysts!) and are then used to refine the instructions. (For example, suppose the AI misses a whole group of genes because it took the wrong number as a p-value significance threshold. We would alter the instructions to explain that the p-value threshold should always match the significance method outlined by the paper or a p-value of 0.05, whichever is more restrictive.)

A general rule of thumb is that broad guidelines lead to excessive irrelevant data, while overly narrow ones risk missing key findings, especially due to the variety of papers that psychiatric literature may deal with. Balancing specificity and flexibility is essential for accurate data extraction.

Over time, repeated refinements enhance the AI’s ability to recognize patterns, but of course, there are obstacles to overcome (for example, computational limitations of current-generation GPTs that all may affect its accuracy). In the next post, I’ll elaborate on these issues and what solutions have been proposed for them.

Thank you!
Adam

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    shriya_s
    Hey Adam, this is really cool! Will you be looking at a specific field of psychiatry (e.g., developmental psychiatry)? And, what software do you plan on using?
    camille_bennett
    Hi Adam, great work! Are the studies you are using specifically about psychiatry, or are they studies where links to psychiatric impact are made? What parameters are you putting on the studies that you are using?
    adam_b
    Hi Shriya and Ms. Bennett! These are great questions! This project will be focused on a wide variety of psychiatric fields, and I will be using ChatGPT and Grok for now. Usually these studies are tied to psychiatric impact, so generally the parameters we have are that it is effective well-produced research and that it draws connections between some form of biomarkers and particular psychiatric conditions.

Leave a Reply

Your email address will not be published. Required fields are marked *