Computer-Aided Quality for Endoscopic AI
The Challenge
Computer-Aided Detection and Diagnosis (CADe/x) systems are rapidly transforming gastrointestinal endoscopy by helping clinicians detect early-stage abnormalities like Barrett's esophagus neoplasia. However, these deep learning models are typically trained on pristine, high-quality, curated datasets. When deployed in real-world clinical environments, model performance often degrades due to sub-optimal imaging conditions such as motion blur, poor lighting, inadequate mucosal cleaning, or collapsed tissue. While standalone Computer-Aided Quality (CADq) preprocessing modules can filter out these unviable frames , running a completely separate deep learning feature extractor alongside an active CADe model introduces a massive computational bottleneck, rendering real-time, zero-latency execution during live procedures nearly impossible.
The Solution
To eliminate this operational friction, Theta Vision collaborated to develop a lightweight, real-time Computer-Aided Quality (CADq) gateway. Instead of introducing an entirely new, parameter-heavy architecture, this system directly repurposes and reuses the multi-level feature representations from a pre-trained, frozen CADe backbone (specifically, a CaFormer-S18 model). By serving as an intelligent, lightweight preprocessing gate, the CADq module analyzes endoscopic imagery in real time, determining if a frame meets strict diagnostic viability criteria before passing it downstream. This shared-backbone approach unlocks substantial computational savings and ensures high-speed inference, seamlessly integrating into clinical workflows to maintain CADe reliability without requiring extra hardware overhead.
Real-Time Quality Assurance Pipeline
The framework evaluates five distinct dimensions of image and domain integrity simultaneously, utilizing a multi-layer perceptron (MLP) multi-head decoder network built directly on top of the shared feature layers:
Overall Image Quality (OIQ) Gate: Evaluates the fundamental visual clarity, sharpness, and illumination of the frame. It automatically flags or rejects severely blurred or poorly lit inputs to eliminate downstream label noise and false predictions.
Mucosal Cleanliness Evaluator: Quantifies the presence of obscuring factors like bubbles, mucus, or fluid debris on a three-level ordinal scale (poor, adequate, good). This ensures that hidden mucosal details are not overlooked due to poor preparation.
Luminal Expansion Tracker: Measures the openness of the esophageal lumen. By ensuring the esophagus is properly distended rather than collapsed, it prevents the CADe system from drawing erroneous conclusions on uninterpretable tissue folds.
Procedural Orientation Classifier: Automatically detects the scope's positioning, distinguishing between a standard forward insertion view and a retrograde (retroflexed) view where the scope flips back to look at itself.
Feature-Based Out-of-Distribution (OOD) Guard: Operates directly within the deep feature space using Mahalanobis distance metrics to calculate an instantaneous anomaly score. It systematically identifies and flags out-of-domain inputs (accidental transitions into the stomach or non-medical artifacts) protecting the core CADe model from out-of-distribution confusion.
Impact
This shared-backbone CADq architecture transforms real-world endoscopic data assessment from a high-latency liability into a streamlined, edge-deployable asset. Extensive evaluation across 6,276 annotated esophageal images proved that the frozen-backbone configuration achieves high diagnostic accuracy across all evaluation tasks, matching the performance of a fully fine-tuned model while drastically reducing memory overhead and eliminating the need for additional trainable parameters. The feature-based Mahalanobis OOD detector demonstrated powerful resilience to domain shifts, successfully isolating non-esophageal anatomical regions. Ultimately, this framework provides endoscopists with instantaneous, clinically meaningful feedback on image quality, improving clinician performance, insulating medical AI systems against real-world data degradation, and establishing a new standard for efficient, trustworthy clinical integration.