Visual Question Answer evaluation dataset for MIMIC CXR

Atilla Kiraly
Timo Kohlberger
Fayaz Jamil
Charles Lau
Tom Pollard
(2025) (to appear)

Abstract

MIMIC CXR is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. In addition, labels for the presence of 12 different chest-related pathologies, as well as of any support devices, and overall normal/abnormal status were made available via the MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) [2] labels, which were generated using the CheXpert and NegBio algorithm.

Based on these labels, we created a small visual question answering dataset comprising 224 questions for 48 cases from the official test set, and 111 questions for 33 validation cases. 68% of the questions are close-ended (answerable with yes or no), and focus on the presence of one out of 15 chest pathologies, or any support device, or generically on any abnormality, whereas the remaining open-ended questions inquire about the location, size, severity or type of a pathology/device, if present in the specific case, indicated by the MIMIC-CXR-JPG labels.

For each question and case we also provide a reference answer, which was authored by a board-certified radiologist based on the chest X-ray and original radiology report.
×