Generative Terminology Mapping: Scaling Medication Text String to RxNorm Conversion in Billion-scale EHR Data

Authors: Philip Ballentine, MSc, Senior Director of Data Engineering, Atropos Health and C. William Pike, MD, Medical Director, Atropos Health

Atropos Health proposes a method of normalizing text strings to standardized healthcare terminology using generative AI, which we term “Generative Terminology Mapping.” This approach combines easily-available tools, known knowledge graphs, and the use of an off-the-shelf Large Language Model (LLM) such as ChatGPT-4. 

Missing terminology mapping is frequently a major hurdle when attempting to use EHR data for research and analytics use cases. In general, we believe the pattern of pairing a more traditional mapping model with a generative AI reviewer– Generative Terminology Mapping– can be applied to multiple domains. Atropos Health sees potential for this method in every healthcare dataset with unmapped text values, making data more accessible, immediately usable, clinically applicable, and methodologically sound, all with up to a 99% lower cost than manual mapping efforts by an expert terminologist.

Experimental Results

Atropos Health's GENEVA OS™ (Generative Evidence Acceleration Operating System) includes multiple datasets derived from claims and EHR data, including one dataset aliased Eos, containing records from over 130 million patients from the United States. As of mid-2023, Eos had 4.9 billion medication order and administration records, but only 35% of these records used standard medication terminology codes. The rest used variable text strings for medication identification, with ~95,000 distinct source medication terms making up these records.

To make this data usable for research and analytics, Atropos Health set out to map medication source terms to RxNorm, a standard terminology that is publicly available, covers United States medications, and includes a knowledge graph for relating concepts, such as programmatically grouping all clinical drug terms that contain a particular ingredient.

This task would take a skilled terminologist ~650 hours to map and cost at least $200,000. Additionally, Eos is updated frequently, which adds new terms and can contain re-mapping of source values that invalidate previous mappings. These shifts can be dramatic between versions of the same dataset over time, which erodes the investment made in previous mappings.

This volatility and potential for high, ongoing costs led Atropos Health to explore automated solutions to map raw text strings such as “‘CIPROFLOXACIN LACTATE VIALS INJECTABLE” to an RxNorm code. In November 2023, the Atropos Health Data Engineering team used Eos data to test several approaches to mapping medication strings to RxNorm codes, and compared the outputs to that of a skilled human terminologist.

Of the approaches we tested, we found that a combination of a public API provided by the Unified Medical Language System (UMLS, provided by the National Library of Medicine, NLM, as part of the National Institutes of Health, NIH) and a Large Language Model (LLM; ChatGPT-4) achieved 99.3% ingredient-level accuracy on the top 998 terms by frequency. We define this metric as whether or not the mapped RxNorm code correctly reflects the ingredients in the source term without omitting or introducing any additional ingredients. In our experience, this level of correctness is often sufficient for cohort building or outcome determination from often-messy Electronic Health Record (EHR) data aggregated from multiple sources such as that in Eos.

This approach–Generative Terminology Mapping–significantly outperformed both an industry-leading proprietary NLP model (83.4% accuracy) and using a basic API from UMLS (92.5% accuracy). We used a prompting-only approach, which did not require re-training, fine-tuning, or otherwise exposing the LLM to any data other than the source term and proposed map term along with instructions on how to evaluate the proposed mapping. This approach generated maps deemed correct for 91% of the source terms in our subset, with 99.3% agreement (95% CI [98.6% - 99.72%]) with our expert human terminologist for the approved maps, a very strong level of agreement (Cohen’s κ 0.899, 95% CI: [0.842, 0.947]).

When we completed this process, there were only seven mappings out of 998 that our ChatGPT-4 reviewer agreed was correct in terms of its ingredients that our human reviewer disagreed with. In other words, if we used the combined UMLS and ChatGPT-4 approach, we would achieve an overall accurate map rate of 91.78%, and, of these, our human reviewer would disagree with the ingredients in only seven (0.7% of all maps, 0.76% of completed maps). 7.5% (75) of the maps would be removed by the ChatGPT-4 ingredient check. Of these, only seven would be maps where our human expert concluded the map was ingredient correct while the ChatGPT-4 review disagreed. Adding ChatGPT-4 as a second reviewer to the UMLS mapped data reduced the error rate of ingredient correctness as measured against our human expert’s ground truth by ~90% (7.5% without ChatGPT-4 to 0.7% with our Generative Terminology Mapping approach).


To learn more, complete the form below to read our Generative Terminology Mapping White Paper:

Previous
Previous

Atropos Health Forms Life Science Advisory Board, Increasing Commitment to Accelerated Drug Development and Clinical Trial Diversity and Inclusion

Next
Next

Atropos Health Launches New Geneva OS and ChatRWD Application for Rapid Real World Evidence with Generative AI