AI for Healthcare - Latest arXiv Papers

1

Robust and Explainable Bicuspid Aortic Valve Diagnosis Using Stacked Ensembles on Echocardiography

Christos Chrysanthos Nikolaidis, Vasileios Sachpekidis, Nikolas Moustakidis et al. (5 authors)

📅 2026-05-13

Transthoracic echocardiography (TTE) is the first-line imaging modality for diagnosing bicuspid aortic valve (BAV), yet diagnostic performance varies with operator expertise and image quality. We developed an explainable AI model that distinguishes BAV from tricuspid aortic valves (TAV) using routinely acquired parasternal long-axis (PLAX) cine loops. A multi-backbone video ensemble was trained...

arXiv → PDF

2

MedCore: Boundary-Preserving Medical Core Pruning for MedSAM

Cenwei Zhang, Suncheng Xiang, Lei You

📅 2026-05-13

Medical segmentation foundation models such as SAM and MedSAM provide strong prompt-driven segmentation, but their image encoders are still too large for many clinical settings. Compression is also risky in medicine because a model can keep high Dice while losing boundary fidelity. We propose MedCore, a structured pruning framework for MedSAM. The main idea is to preserve two kinds of structures:...

arXiv → PDF

3

Cross Modality Image Translation In Medical Imaging Using Generative Frameworks

Giulia Romoli, Alessia Capoccia, Filippo Ruffini et al. (23 authors)

📅 2026-05-13

Medical image-to-image (I2I) translation enables virtual scanning, i.e. the synthesis of a target imaging modality from a source one without additional acquisitions. Despite growing interest, most proposed methods operate on 2D slices, are evaluated on isolated tasks with different experimental set-ups and lack clinical validation. The primary contribution of this work is a reproducible,...

arXiv → PDF

4

IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

Shubham Kumar Nigam, Suparnojit Sarkar, Piyush Patel

📅 2026-05-13

Most existing medical dialogue systems operate in a single-turn question--answering paradigm or rely on template-based datasets, limiting conversational realism and multilingual applicability. We introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages: Assamese, Bengali, Gujarati, Hindi, Marathi, Punjabi, Tamil, Telugu, and Urdu. The...

arXiv → PDF

5

Large Language Models Lack Temporal Awareness of Medical Knowledge

Zihan Guan, Qiao Jin, Guangzhi Xiong et al. (9 authors)

📅 2026-05-13

The existing methods for evaluating the medical knowledge of Large Language Models (LLMs) are largely based on atemporal examination-style benchmarks, while in reality, medical knowledge is inherently dynamic and continuously evolves as new evidence emerges and treatments are approved. Consequently, evaluating medical knowledge without a temporal context may provide an incomplete assessment of...

arXiv → PDF

6

Anatomy-Slot: Unsupervised Anatomical Factorization for Homologous Bilateral Reasoning in Retinal Diagnosis

Yingzhe Ma, Xiao Yang, Yuguo Yin et al. (4 authors)

📅 2026-05-13

Retinal diagnosis is inherently bilateral: clinicians compare homologous structures across eyes (e.g., optic disc asymmetry), yet most deep models operate on monocular representations. We investigate whether explicit structural correspondence improves diagnosis, and propose Anatomy-Slot to operationalize this hypothesis. Anatomy-Slot introduces an unsupervised anatomical bottleneck by decomposing...

arXiv → PDF

7

Adaptive Conformal Prediction for Reliable and Explainable Medical Image Classification

One Octadion, Novanto Yudistira, Lailil Muflikhah

📅 2026-05-13

Deep learning models for medical imaging often exhibit overconfidence, creating safety risks in ambiguous diagnostic scenarios. While Conformal Prediction (CP) provides distribution-free statistical guarantees, standard methods such as Regularized Adaptive Prediction Sets (RAPS) optimize for average efficiency and can mask severe failures on difficult inputs. We propose an Adaptive Lambda...

arXiv → PDF

8

RISED: A Pre-Deployment Safety Evaluation Framework for Clinical AI Decision-Support Systems

Rohith Reddy Bellibatlu

📅 2026-05-13

Aggregate accuracy metrics dominate the evaluation of clinical AI decision-support systems but do not detect deployment-phase failures of input reliability, subgroup equity, threshold sensitivity, or operational feasibility. We propose the RISED Framework: a five-dimension pre-deployment evaluation covering Reliability, Inclusivity, Sensitivity, Equity, and Deployability, in which each dimension...

arXiv → PDF

9

Training Large Language Models to Predict Clinical Events

Benjamin Turtel, Paul Wilczewski, Kris Skotheim

📅 2026-05-12

Longitudinal clinical notes contain rich evidence of how patients evolve over time, but converting this signal into training supervision for clinical prediction remains challenging. We extend Foresight Learning to clinical prediction by converting time-ordered MIMIC-III notes into examples consisting of past patient context, a natural-language question about a possible future event, and a label...

arXiv → PDF

10

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

Yihao Wang, Haoran Xu, Renjie Gu et al. (13 authors)

📅 2026-05-12

The large-scale deployment of personalized healthcare agents demands memory mechanisms that are exceptionally precise, safe, and capable of long-term clinical tracking. However, existing benchmarks primarily focus on daily open-domain conversations, failing to capture the high-stakes complexity of real-world medical applications. Motivated by the stringent production requirements of an...

arXiv → PDF

11

From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction

Mingcheng Zhu, Zhiyao Luo, Yu Liu et al. (4 authors)

📅 2026-05-12

By processing electronic health records (EHRs) as natural language sequences, large language models (LLMs) have shown potential in clinical prediction tasks such as mortality prediction and phenotyping. However, longitudinal or highly frequent EHRs often yield excessively long token sequences that result in high computational costs and even reduced performance. Existing solutions either add...

arXiv → PDF

12

AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

Robin Linzmayer, Georgianna Lin, Di Coneybeare et al. (25 authors)

📅 2026-05-12

We introduce AcuityBench, a benchmark for evaluating whether language models identify the appropriate urgency of care from user medical presentations. Existing health benchmarks emphasize medical question answering, broad health interactions, or narrow workflow-specific triage tasks, but they do not offer a unified evaluation of acuity identification across these settings. AcuityBench addresses...

arXiv → PDF

13

Quantifying Rodda and Graham Gait Classification from 3D Makerless Kinematics derived from a Single-view Video in a Heterogeneous Pediatric Clinical Cohort

Lauhitya Reddy, Seth Donahue, Jeremy Bauer et al. (11 authors)

📅 2026-05-11

Cerebral Palsy (CP) is a neurological disorder of movement and the most common cause of lifelong physical disability in childhood. Approximately 75% of children with CP are ambulatory, and accurate gait assessment is central to preserving walking function, which deteriorates by mid-adulthood in a quarter to half of adults with CP. The Rodda and Graham classification system quantifies...

arXiv → PDF

14

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

Alex Stinard

📅 2026-05-11

Reasoning benchmarks measure clinical performance on clean inputs. We evaluate the step before reasoning: retrieval over real EHR notes, where negation, temporality, and family-versus-patient attribution can flip a correct answer to a wrong one. EpiKG carries an assertion label and a temporality tag with every fact in a patient knowledge graph, then routes retrieval by question intent....

arXiv → PDF

15

CLEF: EEG Foundation Model for Learning Clinical Semantics

Peng Cao, Ali Mirzazadeh, Jong Woo Lee et al. (5 authors)

📅 2026-05-11

Clinical EEG interpretation requires reasoning over full EEG sessions and integrating signal patterns with clinical context. Existing EEG foundation models are largely designed for short-window decoding and do not incorporate clinical context. We introduce CLEF, a clinically grounded long-context EEG foundation model. CLEF represents EEG sessions as 3D multitaper spectrogram tokens, enabling...

arXiv → PDF

16

DuetFair: Coupling Inter- and Intra-Subgroup Robustness for Fair Medical Image Segmentation

Yiqi Tian, Sangjoon Park, Bo Zeng et al. (6 authors)

📅 2026-05-11

Medical image segmentation models can perform unevenly across subgroups. Most existing fairness methods focus on improving average subgroup performance, implicitly treating each subgroup as internally homogeneous. However, this can hide difficult cases within a subgroup, where high-loss samples are obscured by the subgroup mean. We call this problem \textbf{intra-group hidden failure}. To solve...

arXiv → PDF

17

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

Baraa Al Jorf, Farah E. Shamout

📅 2026-05-11

Building effective clinical decision support systems requires the synthesis of complex heterogeneous multimodal data. Such modalities include temporal electronic health records data, medical images, radiology reports, and clinical notes. Large language model (LLM)-based agents have shown impressive performance in various healthcare tasks, especially those involving textual modalities. Considering...

arXiv → PDF

18

Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation

Peiru Yang, Haoran Zheng, Tong Ju et al. (9 authors)

📅 2026-05-11

Retrieval-augmented generation (RAG) is a widely adopted paradigm for enhancing LLMs in medical applications by incorporating expert multimodal knowledge during generation. However, the underlying retrieval databases may naturally contain, or be intentionally injected with, adversarial knowledge, which can perturb model outputs and undermine system reliability. To investigate this risk, prior...

arXiv → PDF

19

Medical Incident Causal Factors and Preventive Measures Generation Using Tag-based Example Selection in Few-shot Learning

Yuna Haseyama, Tomoki Ito, Hiroki Sakaji et al. (4 authors)

📅 2026-05-11

In high-stakes domains such as healthcare, the reliability of Large Language Models (LLMs) is critical, particularly when generating clinical insights from incident reports. This study proposes a tag-based few-shot example selection method for prompting LLMs to generate background/causal factors and preventive measures from details of the medical incidents. For our experiments, we use the...

arXiv → PDF

20

Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning

Ziquan Wei, Tingting Dan, Guorong Wu

📅 2026-05-10

Despite the central role of sensor-derived measurements such as imaging traits and plasma biomarkers in biomedical research and clinical practice, existing generative models for disease prediction largely depend on event-level representations from hospital and registry data. Given the multi-factorial nature of human disease, the absence of explicit modeling of social determinants of health...

arXiv → PDF

21

WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records

Ruan Dong, Yuanyun Zhang, Shi Li

📅 2026-05-10

Representation learning in electronic health records (EHR) has largely followed paradigms inherited from natural language processing, relying on sequence modeling and reconstruction based objectives that treat clinical labels as ground truth. However, real world clinical supervision is inherently weak, arising from heterogeneous, noisy, and institution specific labeling processes such as billing...

arXiv → PDF

22

Medical Model Synthesis Architectures: A Case Study

Katherine M. Collins, Marlene Berke, Ilia Sucholutsky et al. (9 authors)

📅 2026-05-10

Medicine is rife with high-stakes uncertainty. Doctors routinely make clinical judgments and decisions that juggle many fundamental unknowns, like predictions about what might be causing a patients' symptoms or decisions about what treatment to try next. Despite increasing interest in developing AI systems that aid or even replace doctors in clinical settings, current systems struggle with...

arXiv → PDF

23

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

Yixiong Chen, Wenjie Xiao, Pedro R. A. S. Bassi et al. (10 authors)

📅 2026-05-10

Medical vision-language models (VLMs) and AI agents have made significant progress in learning to analyze and reason about clinical images. However, existing medical visual question answering (VQA) benchmarks collapse model capabilities into a single accuracy score, obscuring where and why models fail. We propose DeepTumorVQA, a hierarchical benchmark that follows the multi-stage evidence chain...

arXiv → PDF

24

CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents

Timothy Ossowski, Xinchi Liu, Danyal Maqbool et al. (9 authors)

📅 2026-05-10

Clinical reasoning agents based on large language models (LLMs) aim to automate tasks such as intensive care unit (ICU) monitoring and patient state tracking from electronic health records (EHRs). Existing systems typically rely on manually curated clinical tools or skills for concepts such as sepsis detection and organ failure assessment. However, maintaining these tool libraries requires...

arXiv → PDF

25

MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies

Huy Hoang Ha, Benoit Favre, Francois Portet

📅 2026-05-10

Large language models (LLMs) have saturated standard medical benchmarks that test factual recall, yet their ability to perform higher-order reasoning, such as synthesizing evidence from multiple sources, remains critically under-explored. To address this gap, we introduce MedMeta, the first benchmark designed to evaluate an LLM's ability to generate conclusions from medical meta-analyses...

arXiv → PDF

26

CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

Aishik Nagar, Arun-Kumar Kaliya-Perumal, Yu-Hsuan Han et al. (8 authors)

📅 2026-05-10

Inpatient clinical reasoning is a sequential decision under partial observability: the clinician sees the admission so far and must choose the next action whose downstream consequences are not yet visible. Existing clinical-LLM evaluations and RL rewards signals collapse this into closed-form retrieval, clinical journey leakage, or unanchored LLM-as-judge scoring. We introduce CLR-voyance, a...

arXiv → PDF

27

When Adaptation Fails: A Gradient-Based Diagnosis of Collapsed Gating in Vision-Language Prompt Learning

Yunxuan Fang, Ziwei Zhang, Xinhe Wang

📅 2026-05-10

Adaptive prompting mechanisms have been proposed to enhance vision-language models by dynamically tailoring prompts to inputs. However, in frozen few-shot prompt learning with CLIP-style backbones, we systematically observe that adaptive gates and prompt-selection modules often collapse: they produce nearly constant outputs, contribute negligible gradient signals, and frequently fail to...

arXiv → PDF

28

Key Coverage Matters: Semi-Structured Extraction of OCR Clinical Reports

Yu Wang, Yingyun Li, Ying Qin et al. (4 authors)

📅 2026-05-10

Clinical reports are often fragmented across healthcare institutions because privacy regulations and data silos limit direct information sharing. When patients seek care at a different hospital, they often carry paper or scanned reports from prior visits. This hinders EHR integration and longitudinal review, and downstream applications that depend on more complete patient records, such as patient...

arXiv → PDF

29

LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering

Runze Ma, Shunbo Jia, Haonan Lyu et al. (5 authors)

📅 2026-05-10

The reasoning gap between large and compact vision-language models (VLMs) limits the deployment of medical AI on portable clinical devices. Compact VLMs of 2--4B parameters can run on resource-constrained hardware but lack the multi-step reasoning capacity needed for interpretable clinical decision support. Existing knowledge distillation methods transfer answers without the reasoning process...

arXiv → PDF

30

A Cross-Layered Multi-Drone Coordination for Medical Supply Delivery during Disaster Response Management

Aneesh Calyam, Subrahmanya Chandra Bhamidipati, Zack Murry et al. (4 authors)

📅 2026-05-10

Autonomous drone fleets have immense potential in medical supply delivery during disaster incident response. However, coordinating multiple drones in such settings introduces compounding challenges: dynamic environmental hazards such as wind, obstacles, and intermittent network connectivity, constrained energy budgets, and the need to serve patient locations fairly under deadlines and...

arXiv → PDF

31

Towards Conversational Medical AI with Eyes, Ears and a Voice

Meet Shah, Jason Gusdorf, Anil Palepu et al. (53 authors)

📅 2026-05-10

The practice of medicine relies not only upon skillful dialogue but also on the nuanced exchange and interpretation of rich auditory and visual cues between doctors and patients. Building on the low-latency voice and video processing capabilities of Gemini, we introduce AI co-clinician, a first-of-its-kind conversational AI system utilizing continuous streams of audio-visual data from live...

arXiv → PDF

32

Shapley Regression for Rare Disease Diagnosis Support: a case study on APDS

Safa Alsaidi, Tomás Brogueira, Nizar Mahlaoui et al. (8 authors)

📅 2026-05-09

Activated PI3K8 Syndrome (APDS) is a rare genetic immune disorder caused by variants in PIK3CD or PIK3R1, with highly heterogeneous symptoms that often delay diagnosis. Early recognition is hampered by overlapping clinical presentations and limited clinician awareness, motivating systematic, data-driven approaches to detect APDS-associated phenotypic patterns in routine electronic health records....

arXiv → PDF

33

Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation

Weidong Zheng, Kongyang Chen, Yuanwei Guo et al. (4 authors)

📅 2026-05-09

Class-level machine unlearning aims to remove the influence of specified classes while preserving model utility on retained classes. Existing methods are commonly evaluated by retain-set accuracy, forget-set accuracy, and unlearning time, but these metrics provide limited insight into how forgetting is achieved internally. In this paper, we reveal a bias-dominated shortcut in class-level...

arXiv → PDF

34

Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations

Leopoldo Bertossi

📅 2026-05-09

We establish, from the point of view of Explainable AI (XAI), connections between Consistency-Based Diagnosis (CBD), on one side, and Actual Causality and Causal Responsibility, on the other. CBD has received little attention from the XAI community. Connections between these two areas could have a fruitful impact on XAI and Explainable Data Management.

arXiv → PDF

35

PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis

Lujia Zhong, Yihao Xia, Shuo Huang et al. (5 authors)

📅 2026-05-09

Deep learning models in medical imaging typically operate as parametric memory, diagnosing patients by recalling fixed knowledge learned during training. This contrasts sharply with clinical practice, where physicians employ analogical reasoning to diagnose new cases by referencing similar records from past exemplars. While In-Context Learning (ICL) frameworks such as Tabular Prior-Fitted...

arXiv → PDF

36

Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare

Prasanna Desikan, Harshit Rajgarhia, Shivali Dalmia et al. (4 authors)

📅 2026-05-08

AI models are increasingly deployed in live clinical environments where they must perform reliably across complex, high-stakes workflows that standard training and validation datasets were never designed to capture. Evaluating these systems requires benchmarks: structured combinations of tasks, datasets, and metrics that enable reproducible, comparable measurement of what a model can do. The...

arXiv → PDF

37

MPD$^2$-Router: Mask-aware Multi-expert Prior-regularized Dual-head Deferral Router in Glaucoma Screening and Diagnosis

Wenxin Zhan

📅 2026-05-08

Learning-to-defer (L2D) can make glaucoma screening safer by routing difficult/uncertain cases to humans, yet standard formulations overlook expert availability, heterogeneous readers behavior, workload imbalance, asymmetric diagnostic harm, case difficulty from morphology and deployment shift. We introduce MPD$^2$-Router, a mask-aware multi-expert deferral framework that recasts ophthalmic...

arXiv → PDF

38

MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs

Hsin-Ling Hsu, Zizheng Wang, Donghua Zhang et al. (12 authors)

📅 2026-05-08

Most existing LLM diagnoses are evaluated on static, single-turn settings where complete patient information is provided upfront, an oversimplification of real clinical practice. We study active diagnosis: the real-life clinical process of starting from initial observation, ordering tests, interpreting results, and updating a differential diagnosis across multiple turns. Through systematic...

arXiv → PDF

39

PerCaM-Health: Personalized Dynamic Causal Graphs for Healthcare Reasoning

Elahe Khatibi, Ziyu Wang, Saba A. Farahani et al. (7 authors)

📅 2026-05-08

Personalized healthcare decisions require reasoning about how physiological and behavioral variables influence an individual patient over time. Existing temporal causal discovery methods are poorly matched to this setting: cohort-level models provide stable but non-personalized structures, while per-patient discovery is unreliable because individual trajectories are short, noisy, irregular, and...

arXiv → PDF

40

MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments

Yicheng Gao, Xiaolin Zhou, Yahan Li et al. (5 authors)

📅 2026-05-08

Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as noisy and incomplete information that can happen at any time during the process. However, existing benchmarks for medical LLMs and methods for...

arXiv → PDF

41

Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks

Saisai Hu

📅 2026-05-07

Motivated by the challenge to improve the adversarial robustness, security, and trust of medical decision making intelligent agents, this study develops a full-link security enhancement framework, which describes "input risk perception - medical evidence constraint - knowledge consistency verification - decision confidence reweighting - security output control - adversarial feedback...

arXiv → PDF

42

Knowledge Transfer Scaling Laws for 3D Medical Imaging

Ho Hin Lee, Dongna Du, Chu Wang et al. (7 authors)

📅 2026-05-07

Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. However, training such models requires mixing heterogeneous imaging domains, and current mixture strategies remain largely heuristic. In this...

arXiv → PDF

43

Eliciting associations between clinical variables from LLMs via comparison questions across populations

Fabian Kabus, Kian Kordtomeikel, Thomas Brox et al. (6 authors)

📅 2026-05-07

The training data of large language models (LLMs) comprises a wide range of biomedical literature, reflecting data from many different patient populations. We investigate how it might be possible to recover information on correlation and causal links between patient characteristics, as a key building block for medical decision making. To avoid the pitfalls of direct elicitation, we propose an...

arXiv → PDF

44

A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization

Tianyu Liu, Wangjie Zheng, Rui Yang et al. (15 authors)

📅 2026-05-07

Accurate and timely diagnosis is essential for effective treatment, particularly in the context of rare diseases. However, current diagnostic workflows often lead to prolonged assessment times and low accuracy. To address these limitations, we introduce Hygieia, a multi-modal AI agent system designed to support precision disease diagnosis by integrating diverse data sources, including phenotypic...

arXiv → PDF

45

Bridging visual saliency and large language models for explainable deep learning in medical imaging

Paul Valery Nguezet, Elie Tagne Fute, Yusuf Brima et al. (5 authors)

📅 2026-05-07

The opaque nature of deep learning models remains a significant barrier to their clinical adoption in medical imaging. This paper presents a multimodal explainability framework that bridges the gap between convolutional neural network (CNN) predictions and clinically actionable insights for brain tumor classification, leveraging large language models (LLMs) to deliver human-interpretable...

arXiv → PDF

46

Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction

Shivali Dalmia, Ananya Mantravadi, Prasanna Desikan

📅 2026-05-07

The work in this paper evaluates zero-shot and few-shot large language models (LLMs) for safety-critical clinical action extraction using the CLIP discharge-note dataset, with particular emphasis on transitions of care and post-discharge patient safety. To manage the complexity of clinical documentation, we introduce a two-stage extraction framework that decomposes discharge notes, that are...

arXiv → PDF

47

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader et al. (4 authors)

📅 2026-05-07

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading, macular edema (ME) detection, and report generation. The architecture...

arXiv → PDF

48

Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models

Jose Benitez-Aurioles, Ricardo Silva, Brian McMillan et al. (4 authors)

📅 2026-05-07

In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce label error. This causes systematic model error for specific groups and biases performance metrics during validation. This paper proposes...

arXiv → PDF

49

Medical Imaging Classification with Cold-Atom Reservoir Computing using Auto-Encoders and Surrogate-Driven Training

Nuno Batista, Ana Morgado, Oscar Ferraz et al. (6 authors)

📅 2026-05-07

We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guided auto-encoder. This pipeline learns compact and discriminative representations of image data that are also well-suited for quantum...

arXiv → PDF

50

MTL-MAD: Multi-Task Learners are Effective Medical Anomaly Detectors

Bogdan Alexandru Bercean, Florinel Alin Croitoru, Vlad Hondru et al. (6 authors)

📅 2026-05-07

Anomaly detection in medical images is a challenging task, since anomalies are not typically available during training. Recent methods leverage a single pretext task coupled with a large-scale pre-trained model to reach state-of-the-art performance. Instead, we propose to learn multiple self-supervised and pseudo-labeling tasks from scratch, using a joint model based on Mixture-of-Experts (MoE)....

arXiv → PDF