Task & Evaluation¶

Task Description 📝¶

The core task of the RARE25 challenge is binary classification: determining whether an endoscopic image of a Barrett’s Esophagus (BE) patient contains early neoplasia or not. The goal is to build AI algorithms that can identify subtle but critical signs of early-stage cancer while maintaining a low false positive rate — a key requirement in real-world clinical use.

Evaluation 📈¶

The evaluation of AI algorithms in the RARE25 challenge is designed to mirror real-world clinical demands, placing particular emphasis on high sensitivity and precision. The central performance metric is the Positive Predictive Value at 90% Recall (PPV@90Recall), which reflects an algorithm’s ability to detect early neoplasia in BE patients while maintaining a low false positive rate — a critical requirement for practical deployment in medical settings.

The evaluation procedure is consistent across both the Open Development Phase and the Closed Testing Phase. For each evaluation, all non-dysplastic images in the relevant set are included. Neoplasia images are then sampled with replacement in order to simulate a realistic prevalence rate, targeting a ratio of one neoplasia case for every 100 non-neoplasia cases. This simulates the class imbalance typically encountered in clinical practice.

To ensure robustness and reduce the effect of randomness in the sampling process, this evaluation is repeated 1,000 times. In each iteration, the PPV@90Recall is computed, and the final performance score for a submission is defined as the median PPV@90Recall across all repetitions. This approach allows the challenge to assess how well each algorithm maintains precision under a fixed high-sensitivity constraint, across a large number of realistic, imbalanced scenarios.

During the Open Development Phase, participants will also have access to secondary metrics, including the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PRC), to help guide model development. However, these metrics are provided for reference only and will not influence the final ranking.

The results from the Closed Testing Phase will remain confidential until they are officially presented at MICCAI 2025.