“With AI wearing ever-more hats in all kinds of workplaces, researchers are scrambling to devise tests that grade how well the technology actually performs in its myriad roles. In healthcare alone, several new benchmarks aim to gauge AI’s prowess in medical settings.
But none of the current tests look at how well AI can summarize real-world medical studies, a new report from startup Atropos Health demonstrates. The authors proposed a new framework to evaluate this skill on nine major models from Google, OpenAI, and Anthropic.”