Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models
Published in Proceedings of the 5th Clinical Natural Language Processing Workshop, 2023
Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis. We provide a visual reasoning dataset focusing on numerical understanding in the medical domain. The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain. However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.
Recommended citation: Hitomi Yanaka, Yuta Nakamura (co-first), Yuki Chida, and Tomoya Kurosawa. Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models. Proceedings of the 5th Clinical Natural Language Processing Workshop 2023:8–18.
Recommended citation: Hitomi Yanaka, Yuta Nakamura (co-first), Yuki Chida, and Tomoya Kurosawa. Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models. Proceedings of the 5th Clinical Natural Language Processing Workshop 2023:8–18. https://aclanthology.org/2023.clinicalnlp-1.2/