Agent Evolving - Latest arXiv Papers

1

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Tara Bogavelli, Gabrielle Gauthier Melançon, Katrina Stankiewicz et al. (13 authors)

📅 2026-05-13

Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges: generating realistic simulated conversations, and measuring quality across the full scope of voice-specific failure modes. We present EVA-Bench, an end-to-end...

arXiv → PDF

2

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

Alberto G. Rodríguez Salgado

📅 2026-05-13

Frontier LLMs are increasingly deployed as agents that pick the next action after a long log of prior tool calls produced by the same or a different model. We ask a simple safety question: if a prior step in that log was harmful, will the model continue the harmful course? We build HistoryAnchor-100, 100 short scenarios across ten high-stakes domains, each pairing three forced harmful prior...

arXiv → PDF

3

Harnessing Agentic Evolution

Jiayi Zhang, Yongfeng Gu, Jianhao Ruan et al. (13 authors)

📅 2026-05-13

Agentic evolution has emerged as a powerful paradigm for improving programs, workflows, and scientific solutions by iteratively generating candidates, evaluating them, and using feedback to guide future search. However, existing methods are typically instantiated either as fixed hand-designed procedures that are modular but rigid, or as general-purpose agents that flexibly integrate feedback but...

arXiv → PDF

4

EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments

Annie Liu, Zane Cao, Lang Chen et al. (5 authors)

📅 2026-05-13

The integration of large language models (LLMs) in economic simulations has significantly enhanced agent-based modeling, yet existing frameworks struggle to capture the interplay between short-term optimization and long-term strategic planning. Conventional approaches rely on static data-driven predictions, failing to incorporate adaptive behaviors influenced by economic sentiment, market...

arXiv → PDF

5

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Trung Nguyen Quang, Yiming Gao, Fanyi Pu et al. (6 authors)

📅 2026-05-13

When an omnimodal large language model accepts a question whose textual premise contradicts what it actually sees or hears, does the failure lie in perception or in action? Recent omnimodal models are positioned as perception-grounded agents that jointly process video, audio, and text, yet a basic form of grounding remains untested: catching a textual claim that conflicts with the model's...

arXiv → PDF

6

ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles

Yitian Yang, Yiqun Duan, Linghan Huang et al. (7 authors)

📅 2026-05-13

Large language model (LLM)-based multi-agent simulation offers a powerful testbed for studying social opinion dynamics. Yet current approaches often adopt two contrasting methods: either relying on fixed update rules with limited cognitive grounding or delegating belief change largely to unconstrained LLM interaction. We introduce ScioMind, a cognitively grounded simulation framework that bridges...

arXiv → PDF

7

SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems

Hongji Pu, Xinyuan Song, Liang Zhao

📅 2026-05-13

Large language model agents increasingly rely on skill libraries for multi-step tasks, yet these libraries can accumulate persistent defects as skills are added, reused, patched, and linked to changing dependencies. We call this failure mode skill technical debt: library-level defects that may not break a single skill locally but can harm future retrieval, composition, and execution. Existing...

arXiv → PDF

8

Identifying AI Web Scrapers Using Canary Tokens

Steven Seiden, Triss Ren, Caroline Zhang et al. (6 authors)

📅 2026-05-13

From pre-training to query-time augmentation, web-scraped data helps to improve the quality and contextual relevancy of content generated by large language models (LLMs). However, large-scale web scraping to feed LLMs can affect site stability and raise legal, privacy, or ethics concerns. If website owners wish to limit LLM-related web scraping on their site, due to these or other concerns, they...

arXiv → PDF

9

Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling

Seokha Moon, Minseung Lee, Joon Seo et al. (5 authors)

📅 2026-05-13

End-to-end autonomous driving, which bypasses traditional modular pipelines by directly predicting future trajectories from sensor inputs, has recently achieved substantial progress. However, existing methods often overlook the causal inter-dependencies in ego-vehicle planning, ignoring the reciprocal relations between the ego vehicle and surrounding agents. This causal oversight leads to...

arXiv → PDF

10

How to Interpret Agent Behavior

Jie Gao, Kaiser Sun, Jen-tse Huang et al. (11 authors)

📅 2026-05-13

Autonomous agents such as Claude Code and Codex now operate for hours or even days. Understanding their runtime behavior has become critical for downstream tasks such as diagnosing inefficiencies, fixing bugs, and ensuring better oversight. A primary way to gain this understanding is analyzing the reasoning trajectories and execution traces these agents generate. Yet such data remains in...

arXiv → PDF

11

OpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Research

Peng Kang, Bixuan Li, Xiaoya Huang et al. (8 authors)

📅 2026-05-13

The Materials Genome Initiative catalyzed the proliferation of centralized platforms--SaaS, PaaS, and IaaS--that aggregate computational and experimental resources for accelerated materials discovery. In parallel, breakthroughs in large language models (LLMs) and autonomous agents have created powerful new reasoning capabilities for scientific research. Yet a critical "last mile"...

arXiv → PDF

12

Unweighted ranking for value-based decision making with uncertainty

Aarón López García, Natalia Criado, Jose Such

📅 2026-05-13

As intelligent systems are increasingly implemented in our society to make autonomous decisions, their commitment to human values raises serious concerns. Their alignment with human values remains a critical challenge because it can jeopardise the integrity and security of citizens. For this reason, an innovative human-centred and values-driven approach to decision making is required. In this...

arXiv → PDF

13

Position: Assistive Agents Need Accessibility Alignment

Jie Hu, Changyuan Yan, Yu Zheng et al. (5 authors)

📅 2026-05-13

Assistive agents for Blind and Visually Impaired (BVI) users require accessibility alignment as a first-class design objective. Despite rapid progress in agentic AI, most systems are designed and evaluated under assumptions of sighted interaction, low-cost verification, and tolerable trial-and-error, leading to systematic failures in assistive scenarios that cannot be resolved by model scaling or...

arXiv → PDF

14

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation

Asim Osman, Sasha Abramowitz, Mark Bergh et al. (16 authors)

📅 2026-05-13

Contrastive reinforcement learning (CRL) learns goal-conditioned Q-values through a contrastive objective over state-action and goal representations, removing the need for hand-crafted reward functions. Despite impressive success in achieving viable self-supervised learning in RL, all existing CRL algorithms rely on off-policy optimisation and are mostly constrained to continuous action spaces,...

arXiv → PDF

15

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

Chengzhi Shen, Weixiang Shen, Tobias Susetzky et al. (11 authors)

📅 2026-05-13

Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for reliable AI decision support. Existing ICU benchmarks typically treat historical clinician actions as ground truth. However, these actions are made under incomplete information and limited temporal...

arXiv → PDF

16

Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging

Jiabei Liu, Wenyu Mao, Junfei Tan et al. (7 authors)

📅 2026-05-13

Deep search agents have proven effective in enhancing LLMs by retrieving external knowledge during multi-step reasoning. However, existing methods often generate a single query for retrieval at each reasoning step, limiting information coverage and introducing high noise. This may result in low signal-to-noise ratios (SNR) during search, degrading reasoning accuracy and leading to unnecessary...

arXiv → PDF

17

MMSkills: Towards Multimodal Skills for General Visual Agents

Kangning Zhang, Shuai Shao, Qingyao Li et al. (11 authors)

📅 2026-05-13

Reusable skills have become a core substrate for improving agent capabilities, yet most existing skill packages encode reusable behavior primarily as textual prompts, executable code, or learned routines. For visual agents, however, procedural knowledge is inherently multimodal: reuse depends not only on what operation to perform, but also on recognizing the relevant state, interpreting visual...

arXiv → PDF

18

Cognifold: Always-On Proactive Memory via Cognitive Folding

Suli Wang, Yiqun Duan, Yu Deng et al. (6 authors)

📅 2026-05-13

Existing agent memory remains predominantly reactive and retrieval-based, lacking the capacity to autonomously organize experience into persistent cognitive structure. Toward genuinely autonomous agents, we introduce Cognifold, a brain-inspired "always-on" agent memory designed for the next generation of proactive assistants. CogniFold continuously folds fragmented event streams into...

arXiv → PDF

19

TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints

Zabir Al Nazi, Shubhashis Roy Dipta

📅 2026-05-13

Deploying language models as autonomous agents requires more than per-task accuracy: when an agent faces a queue of problems under a finite token budget, it must decide which to attempt, in what order, and how much compute to commit to each, all before any execution feedback is available. This is the prospective form of metacognitive control studied for decades in human cognition, yet whether...

arXiv → PDF

20

RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents

Liangtian Liu, Zeyuan Wang, Ziyu Li et al. (11 authors)

📅 2026-05-13

The rise of multi-modal large language models (MLLMs) is shifting remote sensing (RS) intelligence from "see" to "action", as OpenClaw-style frameworks enable agents to autonomously operate massive RS image-processing tools for complex tasks. Existing RS agents adopt a passive selection paradigm for tool invocation, relying on either full tool registration (Flat) or...

arXiv → PDF

21

GRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Models

Mingzhe Huang, Weijun Wang, Xin Ding et al. (10 authors)

📅 2026-05-13

In Vision-Language Models (VLMs), processing a massive number of visual tokens incurs prohibitive computational overhead. While recent training-aware pruning methods attempt to selectively discard redundant tokens, they largely rely on continuous-gradient relaxations. However, visual token pruning is inherently a discrete, non-convex combinatorial problem; consequently, these continuous...

arXiv → PDF

22

AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

Hailin Zhong, Shengxin Zhu

📅 2026-05-13

Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability. We propose a different locus: software-engineering capability emerges from a model-harness-environment system, in which a runtime substrate -- the harness -- mediates how a...

arXiv → PDF

23

Multi-Agent Systems in Emergency Departments: Validation Study on a ED Digital Twin

Markus Wenzel, Tobias Strapatsas, Jessika Kress et al. (6 authors)

📅 2026-05-13

Emergency departments (ED) face challenges in patient care and resource management. We propose to explore optimization strategies in a realistic and flexible model and develop a hybrid Discrete Event Simulation (DES) and Agent-Based Model (ABM) simulating highly configurable ED environments. We specifically focus on the validation of the modeling approach. We derive configurations for ED sizes,...

arXiv → PDF

24

Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

Qinchuan Cheng, Zhantao Gong, Pengzhan Sun et al. (6 authors)

📅 2026-05-13

Embodied agents in household environments must plan under partial observation: they need to remember objects, track state changes, and recover when actions fail. Existing benchmarks only partially test this ability. Egocentric video datasets capture realistic human activities but remain passive, while interactive simulators support execution but rely on synthetic scenes and hand-crafted dynamics,...

arXiv → PDF

25

What Limits Vision-and-Language Navigation ?

Yunheng Wang, Yuetong Fang, Taowen Wang et al. (12 authors)

📅 2026-05-13

Vision-and-Language Navigation (VLN) is a cornerstone of embodied intelligence. However, current agents often suffer from significant performance degradation when transitioning from simulation to real-world deployment, primarily due to perceptual instability (e.g., lighting variations and motion blur) and under-specified instructions. While existing methods attempt to bridge this gap by scaling...

arXiv → PDF

26

IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation

Joy Bose

📅 2026-05-13

Current AI-assisted innovation systems typically apply a single ideation methodology (such as TRIZ or Design Thinking) using sequential prompt-based workflows that do not preserve intermediate reasoning structure. As a result, insights generated across methodologies remain fragmented, limiting traceability, synthesis, and systematic evaluation of novelty. We present IdeaForge, a knowledge...

arXiv → PDF

27

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention

Yuanzhe Wang, Tian Zhi, Zihang Wei et al. (11 authors)

📅 2026-05-13

Multi-Agent Path Finding (MAPF) is a coordination problem that requires computing globally consistent, collision-free trajectories from individual start positions to assigned goal positions under combinatorial planning complexity. In dense environments, suboptimal initial plans induce compound conflicts that hinder feasible repair. For repair-based solvers like LNS2, initial plan quality...

arXiv → PDF

28

CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution

Tom Zehle

📅 2026-05-13

LLM-based multi-agent systems have demonstrated strong performance across complex real-world tasks, such as software engineering, predictive modeling, and retrieval-augmented generation. Yet automating their configuration remains a structural challenge, as scores are available only at the system level, whereas the parameters governing agent behavior are local. We argue that optimizing these...

arXiv → PDF

29

D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

Yucheng Guo, Yongjian Guo, Zhong Guan et al. (12 authors)

📅 2026-05-13

The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth...

arXiv → PDF

30

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding

Xiao Liu, Nayu Liu, Junnan Zhu et al. (9 authors)

📅 2026-05-13

Video understanding requires active evidence seeking, motivating tool-augmented video agents for temporal reasoning, cross-modal understanding, and complex question answering. Existing video agents have improved video reasoning with retrieval, memory, frame inspection, and verifier tools, but they still face two limitations: (1) a coarse tool space that lacks fine-grained operations for...

arXiv → PDF

31

An Agentic AI Framework with Large Language Models and Chain-of-Thought for UAV-Assisted Logistics Scheduling with Mobile Edge Computing

Hanwen Zhang, Dusit Niyato, Wei Zhang et al. (5 authors)

📅 2026-05-13

In cloud manufacturing, unmanned aerial vehicles (UAVs) can support both product collection and mobile edge computing (MEC). This joint operation forms a hybrid scheduling problem, where physical logistics decisions are coupled with computational task scheduling. In this paper, UAVs collect finished products from manufacturing stations and transport them back to a central depot. Meanwhile,...

arXiv → PDF

32

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

Hao Zhou, Tiru Wu, Yan Jiang et al. (6 authors)

📅 2026-05-13

Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important. However, existing studies on adversarial attacks in multi-agent systems primarily focus...

arXiv → PDF

33

Decoupled Planning for Multiple Omega-Regular Objectives

Guy Avni, Thomas A. Henzinger, Kaushik Mallik et al. (5 authors)

📅 2026-05-13

We study the problem of generating paths on a graph that satisfy a collection of ω-regular objectives. We propose a decoupled framework in which each objective is assigned to an independent agent that selects a local policy, while a scheduler -- oblivious to the graph and objective -- dynamically composes these policies into a single path. We ask when such a composition satisfies all objectives,...

arXiv → PDF

34

When Does Hierarchy Help? Benchmarking Agent Coordination in Event-Driven Industrial Scheduling

Ziqi Wang, Yuhao Yang, Zhiwei Ling et al. (5 authors)

📅 2026-05-13

Recent advances in agent and multi-agent systems have shown strong performance on tool use, reasoning, and collaborative tasks. However, existing benchmarks mostly evaluate task completion in weakly coupled environments, and provide limited support for studying coordination in shared, dynamically evolving systems with hierarchy and coupled constraints. This leaves an important question...

arXiv → PDF

35

Finding the Weakest Link: Adversarial Attack against Multi-Agent Communications

Maxwell Standen, Junae Kim, Claudia Szabo

📅 2026-05-13

Multi-agent systems rely on communication for information sharing and action coordination, which exposes a vulnerability to attacks. We investigate single-victim communication perturbation attacks against Multi-Agent Reinforcement Learning-trained systems and propose methods that use gradient information from the Jacobian to identify which messages, agent, and timesteps are most susceptible to...

arXiv → PDF

36

Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models

Zixing Lei, Changxing Liu, Yichen Xiong et al. (8 authors)

📅 2026-05-13

Vision-language-action (VLA) models are effective robot action executors, but they remain limited on long-horizon tasks due to the dual burden of extended closed-loop planning and diverse physical operations. We therefore propose VLAs-as-Tools, a strategy that distributes this burden across a high-level vision language model (VLM) agent for temporal reasoning and a family of specialized VLA tools...

arXiv → PDF

37

A Multi-Agent Orchestration Framework for Venture Capital Due Diligence

Grigorios Alexandrou, Katerina Pramatari

📅 2026-05-13

We present a fully automated multi-agent framework for corporate due diligence and market analysis in venture capital. The system runs on an event-driven orchestration architecture, combining Large Language Models (LLMs) with real-time web retrieval to synthesize unstructured data into structured investment intelligence. A central technical contribution is a programmatic extraction pipeline that...

arXiv → PDF

38

Counterfactual Reasoning for Causal Responsibility Attribution in Probabilistic Multi-Agent Systems

Chunyan Mu, Muhammad Najib

📅 2026-05-13

Responsibility allocation -- determining the extent to which agents are accountable for outcomes -- is a fundamental challenge in the design and analysis of multi-agent systems. In this work, we model such systems as concurrent stochastic multi-player games and introduce a notion of retrospective (backward) counterfactual responsibility, which quantifies an agent's accountability for...

arXiv → PDF

39

An Agentic LLM-Based Framework for Population-Scale Mental Health Screening

Giuliano Lorenzoni, Paulo Alencar, Donald Cowan

📅 2026-05-13

Mental health disorders affect millions worldwide, and healthcare systems are increasingly overwhelmed by the volume of clinical data generated from electronic records, telemedicine platforms, and population-level screening programs. At the same time, the emergence of novel AI-based approaches in healthcare calls for intelligent frameworks capable of processing domain-specific unstructured...

arXiv → PDF

40

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

Ying Li, Hongbo Wen, Yanju Chen et al. (6 authors)

📅 2026-05-13

LLM-powered agents can silently delete documents, leak credentials, or transfer funds on a routine user request, not because the agent was attacked, but because the skill it invoked broke its own declared safety rules. We call these specification violations: benign inputs cause a skill to breach the natural-language guardrails in its own specification, typically because the guardrail's...

arXiv → PDF

41

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

Yuxin Liu, Ziang Ye, Yueqing Sun et al. (9 authors)

📅 2026-05-13

Current interactive LLM agents rely on goal-conditioned stepwise planning, where environmental understanding is acquired reactively during execution rather than established beforehand. This temporal inversion leads to Delayed Environmental Perception: agents must infer environmental constraints through trial-and-error, resulting in an Epistemic Bottleneck that traps them in inefficient failure...

arXiv → PDF

42

Conveyor Parcel Routing with Order-Contiguous Arrivals

Takuro Kato, Keisuke Okumura

📅 2026-05-13

In warehouse logistics, parcels released from the outfeed of an automated storage system must be routed through conveyor networks to workstations. Beyond collision avoidance, practical operations impose an additional requirement of order-contiguous arrivals: at each delivery point, parcels belonging to the same order must arrive as a consecutive block in the arrival sequence to reduce downstream...

arXiv → PDF

43

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

Adarsh Kumarappan, Ananya Mujoo

📅 2026-05-13

LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagreement at rates we term yield, a vulnerability widely attributed to RLHF-induced sycophancy. We test this attribution across four model families and find it largely wrong: pretrained base models exhibit the same substitution pattern as their Instruct variants, averaging higher yield than Instruct....

arXiv → PDF

44

Useful Memories Become Faulty When Continuously Updated by LLMs

Dylan Zhang, Yanshan Lin, Zhengkun Wu et al. (7 authors)

📅 2026-05-13

Learning from past experience benefits from two complementary forms of memory: episodic traces -- raw trajectories of what happened -- and consolidated abstractions distilled across many episodes into reusable, schema-like lessons. Recent agentic-memory systems pursue the consolidated form: an LLM rewrites past trajectories into a textual memory bank that it continuously updates with new...

arXiv → PDF

45

Position: Agentic AI System Is a Foreseeable Pathway to AGI

Junwei Liao, Shuai Li, Muning Wen et al. (5 authors)

📅 2026-05-13

Is monolithic scaling the only path to AGI? This paper challenges the dogma that purely scaling a single model is sufficient to achieve Artificial General Intelligence. Instead, we identify Agentic AI as a necessary paradigm for mastering the complex, heterogeneous distribution of real-world tasks. Through rigorous theoretical derivations, we contrast the optimization constraints of monolithic...

arXiv → PDF

46

Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation

Chao Hao, Jun Xu, Ji Du et al. (9 authors)

📅 2026-05-13

Language-guided segmentation transcends the scope limitations of traditional semantic segmentation, enabling models to segment arbitrary target regions based on natural language instructions. Existing approaches typically adopt a two-stage framework: employing Multimodal Large Language Models (MLLMs) to interpret instructions and generate visual prompts, followed by foundational segmentation...

arXiv → PDF

47

When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems

Young Hyun Cho, Will Wei Sun

📅 2026-05-13

LLM-enabled AI workflows increasingly produce outputs through iterative generate-evaluate-revise loops. Each iteration can improve the candidate, but it also creates a release decision: when to stop and output the current result? This raises a statistical challenge because deployment-time evaluator scores are adaptively generated and repeatedly monitored, yet the likelihood models or...

arXiv → PDF

48

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

Priyam Sahoo, Gaurav Mittal, Xiaomin Li et al. (7 authors)

📅 2026-05-13

Evaluation of software engineering (SWE) agents is dominated by a binary signal: whether the final patch passes the tests. This outcome-only view treats a principled solution and a chaotic trial-and-error process as equivalent. We show that this equivalence is empirically false. We evaluate 2,614 OpenHands trajectories from eight model backends on 60 SWE-bench Verified tasks. Of these, 47 have...

arXiv → PDF

49

Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue

Vardhan Dongre, Dilek Hakkani-Tür

📅 2026-05-13

Effective collaboration between embodied agents requires more than acting in a shared environment; it demands communication grounded in each agent's evolving understanding of the world. When agents can only partially observe their surroundings, coordination without communication is provably hard, but communication can, in principle, bridge this gap by allowing agents to share observations...

arXiv → PDF

50

SHM-Agents: A Generalist-Specialist Integrated Agent System for Structural Health Monitoring

Yuequan Bao, Xing Li, Huabin Sun et al. (6 authors)

📅 2026-05-13

Artificial intelligence is increasingly used to simplify complex tasks. In engineering applications of structural health monitoring (SHM), existing specialized algorithms, while effective, often face high implementation barriers, limited interoperability and complex training procedures. To overcome these challenges, this paper proposes SHM-Agents, a generalist-specialist agent system that...

arXiv → PDF