Week 6-A Supervised Approach to Training an Artificial Intelligence to Extract Relevant Genomic Data from Literature
Adam B -
Hello
This week’s emphasis continued to focus on the implementation of my paper processor into API format. Since, originally, the AI was largely web-based, I am finding many unique discrepancies between its old version and its new API. Here, I will discuss those and how they have been addressed throughout the week.
Information Restrictiveness
One of the most important concerns with the AI is that it is as accurate as possible. While the web-based platform was relatively contained in its data collection (even, at times in the early stages, under-reporting what the paper had found), the API version collects far more extra information that is either irrelevant or contradictory to what is suggested by the article.
Throughout the week, I continued to modify the code to remedy these errors. One major fix involved refining how the prompt distinguishes between core text and supplementary areas (like citation references, tables, and figures). By clarifying that only the main body of the text should be considered authoritative (and, for example, NOT the titles of cited sources), the AI became more accurate and consistent. Still, however, there are instances when it captures information from less relevant sections. Perfecting this process will be the subject of later weeks.
Validation
While not exactly a web vs. API discrepancy, an important test for an API is the validation process, which I began this week. This involves the use of a cohort of 20 papers previously analyzed by me and other lab members. The key difference between this phase and testing is how these 20 are delivered: previously, they had been done one at a time, but now the model will receive them en masse.
This week was simply about having the model process these papers. Next week, I’ll dedicate time to assessing the outputs and coding additional constraints or clarifications that might tighten focus on essential data.
Adam
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.