P3295 - Validating Calibration of an Artificial Intelligence Assessment of Endoscopic Severity in Ulcerative Colitis

Monday, October 27, 2025

10:30 AM - 4:00 PM PDT

Location: Exhibit Hall

Has Audio

Presenting Author(s)

Darren Thomason, MBA

Iterative Health Inc
New York, NY

Chakib Battioui, PhD¹, Pavel Brodskiy, PhD², Klaus Gottlieb, MD, PhD, JD¹, Mohammad Haft-Javaherian, PhD², William J. Eastman, ¹, Julian Lehrer, ², Evan Yu, PhD², Derek Onken, PhD¹, , Darren Thomason, MBA², Walter Reinisch, MD, PhD³Daniel Colucci, ⁴, Shrujal Baxi, MD⁴
1Eli Lilly and Company, Indianapolis, IN; 2Iterative Health Inc, Cambridge, MA; 3Medical University of Vienna, Department of Internal Medicine III, Division of Gastroenterology and Hepatology, Spitalgasse, Wien, Austria; 4Iterative Health Inc, New York, NY

Introduction: Regulatory guidance recommends the endoscopy subscore as the index to assess the endoscopic component of the primary endpoint in ulcerative colitis (UC) trials. Inter-reader variability in assessments may impact the reliability of trial results. Currently, there is no metric in place to assess the certainty by which a reader is assigning an endoscopy subscore. Machine learning (ML) provides an opportunity to assess the endoscopy subscore and provide a measurement of its certainty in a standardized manner. Artificial Intelligence Assessment of Endoscopic Severity (AI-ES) accurately assesses the endoscopy subscore. The objective of this study is to evaluate the calibration of AI-ES - how well its predicted probabilities reflect true likelihoods - to assess the reliability of its measurement of certainty in endoscopy subscore assessments in UC trials.

Methods: AI-ES is a deep learning algorithm that assesses the endoscopy subscore in UC endoscopic videos. AI-ES measures probability for the four ordinal endoscopy subscore classes. The endoscopy subscore with the highest probability is assigned as the final score by AI-ES. We assessed calibration on a holdout test set of 639 videos (~25%) from the Phase 3 induction trial for mirikizumab in UC (NCT03518086). Videos had a 2+1 centrally read endoscopy subscore, randomly selected from week 0 and 12 with a distribution of endoscopic severity similar to the overall study population. Calibration plots were generated across endoscopy subscore classes with probabilities grouped into septiles (~100 videos per group) for primary analysis and deciles for confirmation. Brier scores, ranging from 0 (perfect calibration) to 1 (worst calibration), were calculated, with values < 0.25 considered informative.

Results: AI-ES demonstrated strong calibration, with Brier scores below < 0.25 for each endoscopy subscore (0: 0.037, 1: 0.082, 2: 0.162, 3: 0.112). The Brier score for evaluation of endoscopic improvement (0,1 vs 2,3) also showed excellent calibration (0.066). Findings were consistent when assessing probabilities by deciles.

Discussion: Whereas data on the certainty of human readers in endoscopy subscore assessments are elusive, AI-ES is calibrated across all endoscopy subscore classes, providing reliable data on score probabilities. This novel measurement of certainty by AI-ES added to the score assessment may enable novel AI-based multi-reader or consensus workflows in trials, potentially improving the reliability of UC endpoint assessments.

Figure: Figure 1. Calibration plots measuring the reliability of the model’s probability of endoscopy subscore class predictions for endoscopy subscores 0 or 1 (A) and 2 or 3 (B). Data is based on septiles of predicted probabilities.

Disclosures:

Chakib Battioui: Eli Lilly – Employee.

Pavel Brodskiy: Iterative Health Inc – Employee.

Klaus Gottlieb: Eli Lilly – Employee.

Mohammad Haft-Javaherian: Iterative Health Inc – Employee.

William Eastman: Eli Lilly and Company – Employee, Stock Options.

Julian Lehrer: Iterative Health Inc – Employee.

Evan Yu: Iterative Health Inc – Employee.

Derek Onken: Eli Lilly – Employee.

Darren Thomason: Iterative Health – Employee.

Walter Reinisch: AbbVie – Advisory Committee/Board Member, Consultant, Grant/Research Support. Actelion – Advisory Committee/Board Member, Consultant. Alpha Wasserman – Advisory Committee/Board Member, Consultant. AstraZeneca – Advisory Committee/Board Member, Consultant. Cellerix – Advisory Committee/Board Member, Consultant. Cosmo Pharmaceuticals – Advisory Committee/Board Member, Consultant. Ferring Pharmaceuticals – Advisory Committee/Board Member, Consultant. Genentech – Advisory Committee/Board Member, Consultant. Grunenthal – Advisory Committee/Board Member, Consultant. Johnson & Johnson – Advisory Committee/Board Member, Consultant. Merck – Advisory Committee/Board Member, Consultant. Millennium – Advisory Committee/Board Member, Consultant. Novo Nordisk – Advisory Committee/Board Member, Consultant. Nycomed – Advisory Committee/Board Member, Consultant. Pfizer – Advisory Committee/Board Member, Consultant. Pharmacosmos – Advisory Committee/Board Member, Consultant. Salix Pharmaceuticals – Advisory Committee/Board Member, Consultant. Schering-Plough – Advisory Committee/Board Member, Consultant. Takeda – Advisory Committee/Board Member, Consultant. UCB Pharma – Advisory Committee/Board Member, Consultant. Vifor Pharma – Advisory Committee/Board Member, Consultant.

Daniel Colucci: Iterative Health Inc – Employee.

Shrujal Baxi: Iterative Health Inc – Employee.