Artificial Intelligence Finds Difficulty Distinguishing Anatomical Orientations in Medical Imaging
In a recent study, the performance of four vision-language models was tested in determining relative positions in radiological images, particularly when anatomy is flipped or rotated. The models, GPT-4o, Llama3.2, Pixtral, and DeepSeek's JanusPro, were found to struggle in this simplified setting, indicating underlying weaknesses in relative positioning that are not limited to medical imagery.
The study used plain white images with randomly placed markers and asked simple questions to test the models' ability to determine relative positions without any medical context. The images were obtained using the SimpleITK framework, and anatomical flat images were extracted from volumetric data using the TotalSegmentator project.
The first research question (RQ1) was whether current top-tier vision-language models can accurately determine relative positions in radiological images. Results showed accuracies near 50 percent across all models, indicating performance at chance level, and an inability to reliably judge relative positions without visual markers.
To address the third question (RQ3), the authors examined whether the models rely more on prior anatomical knowledge than on visual input when determining relative positions. GPT-4o and Pixtral showed small accuracy gains when letter or number markers were used, while JanusPro and Llama3.2 saw little to no benefit. This suggests that the models, when presented with medical imagery, tend to rely on their prior knowledge of anatomy rather than the visual cues in the image.
Interestingly, the challenge of AI models not being robust to anatomical flips/rotations is acknowledged in AI image analysis literature as a limitation related to training data diversity and model architecture. Addressing this issue requires explicit model design or training techniques, such as data augmentation or spatial transformers, though these are topics more widely reported in AI technical literature than clinical meta-analyses.
The study highlights one of the most under-reported and fundamental shortcomings of the current wave of state-of-the-art (SOTA) language models - that, if the task can possibly be avoided, and unless you present your material carefully, they will not read the texts you upload or examine the images that you present to them.
Despite the challenges faced by AI models in determining relative positions in flipped or rotated anatomical images, it is important to note that AI achieves high diagnostic accuracy in detecting specific diseases from medical images. However, these studies focus on disease detection, not positional or spatial accuracy on altered views. As the field of AI and medical imaging continues to evolve, addressing the issue of model robustness to anatomical flips and rotations will be crucial for improving the accuracy and reliability of AI in clinical settings.
References:
- [High accuracy in retinal disease detection with deep learning][1]
- [AI enhances image quality and detection of anomalies][2]
- [Large, diverse training datasets and additional clinical context improve AI performance][5]
- [Challenges in AI model generalization and spatial invariance][4]
- [Invariance to image transformations in AI spatial awareness research][6]
[1]: [Link to retinal disease detection study] [2]: [Link to image quality and anomaly detection study] [3]: [Link to large, diverse training datasets and additional clinical context study] [4]: [Link to challenges in AI model generalization and spatial invariance study] [5]: [Link to AI spatial awareness and invariance to image transformations research] [6]: [Link to additional context on AI spatial awareness and invariance to image transformations]
Science and health-and-wellness applications of artificial intelligence (AI) are significant in various fields, especially medical imaging. However, a recent study found that current top-tier vision-language models struggle to determine relative positions in flipped or rotated medical images, indicating an underlying weakness in this area that is not limited to medical imagery (technology). Despite these challenges, AI continues to excel in detecting specific medical-conditions from medical images (health-and-wellness). As the field evolves, addressing AI's robustness to anatomical flips and rotations will be crucial for improving its accuracy and reliability in clinical settings (science).