Atropos Health conducted a comprehensive head-to-head evaluation of eight large language models (LLMs). The models were benchmarked for three dimensions of accuracy: direction of effect, numerical accuracy, and completeness—evaluating whether the summaries captured all statistically significant data from the underlying Real-World Evidence (RWE). 

Read the full case study