A recent article features Dr. Nigam Shah–co-founder of Atropos Health, chief data scientist at Stanford Health Care, and Stanford University professor–discussing his involvement in the development of MedHELM, a new tool from Stanford researchers designed to evaluate the performance of large language models (LLMs) in healthcare. Shah and his colleagues at Stanford developed MedHELM to assess LLMs across a range of healthcare-specific tasks, providing valuable insights for healthcare systems and AI developers. MedHELM aims to provide a more accurate assessment of LLMs for use in healthcare by evaluating their performance on real-world clinical tasks, rather than relying on traditional multiple-choice questions and academic knowledge.

Shah and the developers of MedHELM aim to expand the tool by incorporating additional datasets, tasks, and models. They also encourage contributions from the healthcare community to enhance the tool’s comprehensiveness and accuracy.

Read the full article