*EMBARGOED All research presented at the 2024 ACG Annual Scientific Meeting and Postgraduate Course is strictly embargoed until Sunday, October 27, 2024, at 12:00 pm ET.

P1911 – Optical Accuracy of Artificial Intelligence Large Language Models in Classifying Colorectal Polyps Based on Shape, Size, and Histology, Using Endoscopic Images
Monday, October 28, 2024 | 10:30 AM – 4:00 PM ET | Location: Exhibit Hall E
Author Insight from Tarek Souaid, MD, MPH
What’s new here and important for clinicians?
Our study is one of the first to evaluate advanced Large Language Models (LLMs) with image recognition capabilities—specifically OpenAI’s GPT4o, Google’s Gemini 1.5 Pro, and Anthropic’s Claude 3 Sonnet—in the context of classifying colorectal polyps from endoscopic images. While these AI chatbots have shown promise in enhancing medical experiences for both patients and clinicians, their performance in accurately characterizing polyps based on shape, size, and histology is currently suboptimal.
Brief Highlights:
- Accuracy Limitations: GPT4o achieved only 40% accuracy in correctly classifying all three polyp characteristics (shape, size, and histology) combined, while Gemini 1.5 Pro and Claude 3 Sonnet both scored 33%.
- No Learning Curve Observed: None of the LLMs showed improvement over time, indicating a lack of adaptive learning in this context in the absence of fine-tuning and use of training sets.
- Implications for Clinical Practice: The current unreliability of these models suggests they are not yet ready for live clinical testing and use in polyp characterization.
Findings into Perspective:
While AI and LLMs hold significant potential for medical diagnostics, our findings highlight a critical gap between their current technological capabilities and clinical needs. Clinicians should be cautious about relying on these models in their raw unprimed state for diagnostic purposes until further advancements are made.
What do patients need to know?
It is important to understand that while artificial intelligence is becoming more prevalent in healthcare, current publicly available Large Language Models (LLMs) are not yet reliable enough to replace human expertise in characterizing colorectal polyps from endoscopic images.
Essential Takeaways:
- Ongoing Reliance on Medical Professionals: Your doctors remain the most accurate and trustworthy source for diagnosing and characterizing colorectal polyps.
- LLMs remain an Experimental Tool in Endoscopy: While LLMs’ visual capabilities have potential, they are currently an experimental tool in endoscopy that require significant improvement before it can impact clinical decisions in this area.
- Future of LLMs in Healthcare: Ongoing research is focused on enhancing these technologies, which may eventually lead to more efficient and accurate diagnostics. This will potentially increase access to advanced expertise, and benefit population-based care in the long run.
Our study underscores the importance of continued experimentation and testing of LLMs in healthcare, until they meet the high standards required for clinical applications. To date, medical professionals remain indispensable in providing accurate diagnoses and personalized care.

Author Contact
Tarek Souaid, MD, MPH
Cleveland Clinic
Cleveland, OH
souaidt [at] ccf.org
Media Interview Requests
To arrange an interview with any ACG experts or abstract authors, please contact Becky Abel of ACG via email at mediaonly [at] gi.org or by phone at 301-263-9000.