← Back to Homepage

LLM Reasoning

大语言模型推理能力研究

📊 50 Papers 📅 Updated: 2026-04-01
1
Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?
Max Kaufmann, David Lindner, Roland S. Zimmermann et al. (4 authors)
📅 2026-03-31
Chain-of-Thought (CoT) monitoring, in which automated systems monitor the CoT of an LLM, is a promising approach for effectively overseeing AI systems. However, the extent to which a model's CoT helps us oversee the model - the monitorability of the CoT - can be affected by training, for instance by the model learning to hide important features of its reasoning. We propose and empirically...
2
The Triadic Cognitive Architecture: Bounding Autonomous Action via Spatio-Temporal and Epistemic Friction
Davide Di Gioia
📅 2026-03-31
Current autonomous AI agents, driven primarily by Large Language Models (LLMs), operate in a state of cognitive weightlessness: they process information without an intrinsic sense of network topology, temporal pacing, or epistemic limits. Consequently, heuristic agentic loops (e.g., ReAct) can exhibit failure modes in interactive environments, including excessive tool use under congestion,...
3
Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models
Md Saad, Sajjad Hussain, Mohd Suhaib
📅 2026-03-31
This paper introduces a new hybrid framework that combines Reinforcement Learning (RL) and Large Language Models (LLMs) to improve robotic manipulation tasks. By utilizing RL for accurate low-level control and LLMs for high level task planning and understanding of natural language, the proposed framework effectively connects low-level execution with high-level reasoning in robotic systems. This...
4
Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives
Mohammadhossein Khojasteh, Yifan Jiang, Stefano De Giorgis et al. (5 authors)
📅 2026-03-31
Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. Yet, analogies between narrative structures remain challenging for machines. Cognitive engines for structural mapping are not directly applicable, as they assume pre-extracted entities, whereas LLMs' performance is sensitive to prompt format and the degree of surface similarity between...
5
Trimodal Deep Learning for Glioma Survival Prediction: A Feasibility Study Integrating Histopathology, Gene Expression, and MRI
Iain Swift, JingHua Ye
📅 2026-03-31
Multimodal deep learning has improved prognostic accuracy for brain tumours by integrating histopathology and genomic data, yet the contribution of volumetric MRI within unified survival frameworks remains unexplored. This pilot study extends a bimodal framework by incorporating Fluid Attenuated Inversion Recovery (FLAIR) MRI from BraTS2021 as a third modality. Using the TCGA-GBMLGG cohort (664...
6
C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving
Zhihong Cui, Haoran Tang, Tianyi Li et al. (7 authors)
📅 2026-03-31
Trajectory planning for autonomous driving increasingly leverages large language models (LLMs) for commonsense reasoning, yet LLM outputs are inherently unreliable, posing risks in safety-critical applications. We propose C-TRAIL, a framework built on a Commonsense World that couples LLM-derived commonsense with a trust mechanism to guide trajectory planning. C-TRAIL operates through a...
7
SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models
Adar Avsian, Larry Heck
📅 2026-03-31
Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction...
8
Reasoning-Driven Synthetic Data Generation and Evaluation
Tim R. Davidson, Benoit Seguin, Enrico Bacis et al. (5 authors)
📅 2026-03-31
Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods...
9
From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety
Ganen Sethupathy, Lalit Dumka, Jan Schagen
📅 2026-03-31
Public spaces such as transport hubs, city centres, and event venues require timely and reliable detection of potentially violent behaviour to support public safety. While automated video analysis has made significant progress, practical deployment remains constrained by latency, privacy, and resource limitations, particularly under edge-computing conditions. This paper presents the design and...
10
Training-Free Dynamic Upcycling of Expert Language Models
Eros Fanì, Oğuzhan Ersoy
📅 2026-03-31
Large Language Models (LLMs) have achieved remarkable performance on a wide range of specialized tasks, exhibiting strong problem-solving capabilities. However, training these models is prohibitively expensive, and they often lack domain-specific expertise because they rely on general knowledge datasets. Expertise finetuning can address this issue; however, it often leads to overspecialization,...
11
CausalPulse: An Industrial-Grade Neurosymbolic Multi-Agent Copilot for Causal Diagnostics in Smart Manufacturing
Chathurangi Shyalika, Utkarshani Jaimini, Cory Henson et al. (4 authors)
📅 2026-03-31
Modern manufacturing environments demand real-time, trustworthy, and interpretable root-cause insights to sustain productivity and quality. Traditional analytics pipelines often treat anomaly detection, causal inference, and root-cause analysis as isolated stages, limiting scalability and explainability. In this work, we present CausalPulse, an industry-grade multi-agent copilot that automates...
12
Spontaneous Functional Differentiation in Large Language Models: A Brain-Like Intelligence Economy
Junjie Zhang, Zhen Shen, Gang Xiong et al. (4 authors)
📅 2026-03-31
The evolution of intelligence in artificial systems provides a unique opportunity to identify universal computational principles. Here we show that large language models spontaneously develop synergistic cores where information integration exceeds individual parts remarkably similar to the human brain. Using Integrated Information Decomposition across multiple architectures we find that middle...
13
Reinforced Reasoning for End-to-End Retrosynthetic Planning
Chenyang Zuo, Siqi Fan, Yizhen Luo et al. (4 authors)
📅 2026-03-31
Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To...
14
Symphony for Medical Coding: A Next-Generation Agentic System for Scalable and Explainable Medical Coding
Joakim Edin, Andreas Motzfeldt, Simon Flachs et al. (4 authors)
📅 2026-03-31
Medical coding translates free-text clinical documentation into standardized codes drawn from classification systems that contain tens of thousands of entries and are updated annually. It is central to billing, clinical research, and quality reporting, yet remains largely manual, slow, and error-prone. Existing automated approaches learn to predict a fixed set of codes from labeled data, thereby...
15
KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models
Zhenning Chen, Hanbei Zhan, Yanwei Huang et al. (7 authors)
📅 2026-03-31
Large Language Models (LLMs) demonstrate exceptional capabilities in factual question answering, yet they sometimes provide incorrect responses. To address this issue, knowledge editing techniques have emerged as effective methods for correcting factual information in LLMs. However, typical knowledge editing workflows struggle with identifying the optimal set of model layers for editing and rely...
16
View-oriented Conversation Compiler for Agent Trace Analysis
Lvmin Zhang, Maneesh Agrawala
📅 2026-03-31
Agent traces carry increasing analytical value in the era of context learning and harness-driven agentic cognition, yet most prior work treats conversation format as a trivial engineering detail. Modern agent conversations contain deeply structured content, including nested tool calls and results, chain-of-thought reasoning blocks, sub-agent invocations, context-window compaction boundaries, and...
17
Concept frustration: Aligning human concepts and machine representations
Enrico Parisini, Christopher J. Soelistyo, Ahab Isaac et al. (5 authors)
📅 2026-03-31
Aligning human-interpretable concepts with the internal representations learned by modern machine learning systems remains a central challenge for interpretable AI. We introduce a geometric framework for comparing supervised human concepts with unsupervised intermediate representations extracted from foundation model embeddings. Motivated by the role of conceptual leaps in scientific discovery,...
18
Learning Diagnostic Reasoning for Decision Support in Toxicology
Nico Oberländer, David Bani-Harouni, Tobias Zellner et al. (6 authors)
📅 2026-03-31
Acute poly-substance intoxication requires rapid, life-saving decisions under substantial uncertainty, as clinicians must rely on incomplete ingestion details and nonspecific symptoms. Effective diagnostic reasoning in this chaotic environment requires fusing unstructured, non-medical narratives (e.g. paramedic scene descriptions and unreliable patient self-reports or known histories), with...
19
Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries
Luoxin Chen, Yichi Zhou, Huishuai Zhang
📅 2026-03-31
Large language models (LLMs) have recently demonstrated impressive performance on complex, multi-step reasoning tasks, especially when post-trained with outcome-rewarded reinforcement learning Guo et al. 2025. However, it has been observed that outcome rewards often overlook flawed intermediate steps, leading to unreliable reasoning steps even when final answers are correct. To address this...
20
Hallucination-aware intermediate representation edit in large vision-language models
Wei Suo, Hanzu Zhang, Lijun Zhang et al. (6 authors)
📅 2026-03-31
Large Vision-Language Models have demonstrated exceptional performance in multimodal reasoning and complex scene understanding. However, these models still face significant hallucination issues, where outputs contradict visual facts. Recent research on hallucination mitigation has focused on retraining methods and Contrastive Decoding (CD) methods. While both methods perform well, retraining...
21
Is my model perplexed for the right reason? Contrasting LLMs' Benchmark Behavior with Token-Level Perplexity
Zoë Prins, Samuele Punzo, Frank Wildenburg et al. (5 authors)
📅 2026-03-31
Standard evaluations of Large language models (LLMs) focus on task performance, offering limited insight into whether correct behavior reflects appropriate underlying mechanisms and risking confirmation bias. We introduce a simple, principled interpretability framework based on token-level perplexity to test whether models rely on linguistically relevant cues. By comparing perplexity...
22
Beyond Idealized Patients: Evaluating LLMs under Challenging Patient Behaviors in Medical Consultations
Yahan Li, Xinyi Jie, Wanjia Ruan et al. (8 authors)
📅 2026-03-31
Large language models (LLMs) are increasingly used for medical consultation and health information support. In this high-stakes setting, safety depends not only on medical knowledge, but also on how models respond when patient inputs are unclear, inconsistent, or misleading. However, most existing medical LLM evaluations assume idealized and well-posed patient questions, which limits their...
23
PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent
Hongyi Nie, Xunyuan Liu, Yudong Bai et al. (7 authors)
📅 2026-03-31
Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration. However, real-world smartphone use is highly personalized: users adopt diverse workflows and preferences, challenging agents to deliver customized assistance rather than generic solutions. Existing GUI agent benchmarks cannot adequately capture this...
24
PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models
Amirreza Rouhi, Parikshit Sakurikar, Satya Sai Reddy et al. (9 authors)
📅 2026-03-31
A critical gap exists between the general-purpose visual understanding of state-of-the-art physical AI models and the specialized perceptual demands of structured real-world deployment environments. We present PRISM, a 270K-sample multi-view video supervised fine-tuning (SFT) corpus for embodied vision-language-models (VLMs) in real-world retail environments. PRISM is motivated by a simple...
25
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Zhuowen Liang, Xiaotian Lin, Zhengxuan Zhang et al. (6 authors)
📅 2026-03-31
Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support reliable, verifiable QA. We propose a two-pillar framework, LiteCoST, to achieve...
26
AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction
Harsh Mankodiya, Chase Gallik, Theodoros Galanos et al. (4 authors)
📅 2026-03-31
The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across...
27
Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of Long-Term Context Retention
Sunil Tiwari, Payal Fofadiya
📅 2026-03-31
Long-horizon dialogue systems suffer from semanticdrift and unstable memory retention across extended sessions. This paper presents a Multi-Layer Memory Framework that decomposes dialogue history into working, episodic, and semantic layers with adaptive retrieval gating and retention regularization. The architecture controls cross-session drift while maintaining bounded context growth and...
28
LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning
Haihong Hao, Lei Chen, Mingfei Han et al. (8 authors)
📅 2026-03-31
Existing vision-and-language navigation (VLN) models primarily reason over past and current visual observations, while largely ignoring the future visual dynamics induced by actions. As a result, they often lack an effective understanding of the causal relationship between actions and how the visual world changes, limiting robust decision-making. Humans, in contrast, can imagine the near future...
29
SemLoc: Structured Grounding of Free-Form LLM Reasoning for Fault Localization
Zhaorui Yang, Haichao Zhu, Qian Zhang et al. (5 authors)
📅 2026-03-31
Fault localization identifies program locations responsible for observed failures. Existing techniques rank suspicious code using syntactic spectra--signals derived from execution structure such as statement coverage, control-flow divergence, or dependency reachability. These signals collapse for semantic bugs, where failing and passing executions follow identical code paths and differ only in...
30
PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering
Xingyu Li, Rongguang Wang, Yuying Wang et al. (8 authors)
📅 2026-03-30
Large language models (LLMs) remain brittle on multi-hop question answering (MHQA), where answering requires combining evidence across documents through retrieval and reasoning. Iterative retrieval systems can fail by locking onto an early low-recall trajectory and amplifying downstream errors, while planning-only approaches may produce static query sets that cannot adapt when intermediate...
31
Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning
Bilgehan Sel, Xuanli He, Alwin Peng et al. (5 authors)
📅 2026-03-30
Fine-tuning APIs offered by major AI providers create new attack surfaces where adversaries can bypass safety measures through targeted fine-tuning. We introduce Trojan-Speak, an adversarial fine-tuning method that bypasses Anthropic's Constitutional Classifiers. Our approach uses curriculum learning combined with GRPO-based hybrid reinforcement learning to teach models a communication...
32
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
Yubo Li, Lu Zhang, Tianchong Jiang et al. (5 authors)
📅 2026-03-30
Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal,...
33
Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction
Diego C. Lerma-Torres
📅 2026-03-30
Large language models lack persistent, structured memory for long-term interaction and context-sensitive retrieval. Expanding context windows does not solve this: recent evidence shows that context length alone degrades reasoning by up to 85% - even with perfect retrieval. We propose a bio-inspired memory framework grounded in complementary learning systems theory, cognitive behavioral...
34
Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance
Siva Kumar Sastry Hari, Vignesh Balaji, Sana Damani et al. (5 authors)
📅 2026-03-30
Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations. First, the abstraction level that agents operate at is important. If it is too low, the LLM wastes reasoning on low-impact details. If it is too high, it may...
35
Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference
Zifan He, Rui Ma, Yizhou Sun et al. (4 authors)
📅 2026-03-30
Modern large language models (LLMs) increasingly depends on efficient long-context processing and generation mechanisms, including sparse attention, retrieval-augmented generation (RAG), and compressed contextual memory, to support complex reasoning. We show that these optimizations can be unified into a four-step memory processing pipeline: Prepare Memory, Compute Relevancy, Retrieval, and Apply...
36
Multi-Agent LLMs for Adaptive Acquisition in Bayesian Optimization
Andrea Carbonati, Mohammadsina Almasi, Hadis Anahideh
📅 2026-03-30
The exploration-exploitation trade-off is central to sequential decision-making and black-box optimization, yet how Large Language Models (LLMs) reason about and manage this trade-off remains poorly understood. Unlike Bayesian Optimization, where exploration and exploitation are explicitly encoded through acquisition functions, LLM-based optimization relies on implicit, prompt-based reasoning...
37
Enhancing Policy Learning with World-Action Model
Yuci Han, Alper Yilmaz
📅 2026-03-30
This paper presents the World-Action Model (WAM), an action-regularized world model that jointly reasons over future visual observations and the actions that drive state transitions. Unlike conventional world models trained solely via image prediction, WAM incorporates an inverse dynamics objective into DreamerV2 that predicts actions from latent state transitions, encouraging the learned...
38
CrossTrace: A Cross-Domain Dataset of Grounded Scientific Reasoning Traces for Hypothesis Generation
Andrew Bouras, OMS-II Research Fellow
📅 2026-03-30
Scientific hypothesis generation is a critical bottleneck in accelerating research, yet existing datasets for training and evaluating hypothesis-generating models are limited to single domains and lack explicit reasoning traces connecting prior knowledge to novel contributions. I introduce CrossTrace, a dataset of 1,389 grounded scientific reasoning traces spanning biomedical research (518),...
39
ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts
Rongtian Ye
📅 2026-03-30
Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively on single-chart interpretation rather than comparative reasoning across multiple charts. To address this gap, we introduce ChartDiff, the first large-scale benchmark for cross-chart comparative summarization. ChartDiff consists of 8,541 chart pairs spanning diverse data sources,...
40
SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning
Philip Schroeder, Thomas Weng, Karl Schmeckpeper et al. (6 authors)
📅 2026-03-30
Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enabling policies to exploit perceptual errors rather than solve the task. To...
41
Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning
Ziqi Miao, Haonan Jia, Lijun Li et al. (7 authors)
📅 2026-03-30
Reinforcement learning with verifiable rewards (RLVR) has substantially enhanced the reasoning capabilities of multimodal large language models (MLLMs). However, existing RLVR approaches typically rely on outcome-driven optimization that updates both perception and reasoning using a shared reward based solely on the final answer. This shared reward blurs credit assignment, frequently improving...
42
ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning
Huanxuan Liao, Zhongtao Jiang, Yupu Hao et al. (9 authors)
📅 2026-03-30
Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding representations are compressed but in the volume of pixels the encoder receives, and address it with...
43
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models
Han Wang, Yifan Sun, Brian Ko et al. (11 authors)
📅 2026-03-30
Large language models (LLMs) can generate chains of thought (CoTs) that are not always causally responsible for their final outputs. When such a mismatch occurs, the CoT no longer faithfully reflects the decision-critical factors driving the model's behavior, leading to the reduced CoT monitorability problem. However, a comprehensive and fully open-source benchmark for studying CoT...
44
Towards a Medical AI Scientist
Hongtao Wu, Boyun Zheng, Dingjie Song et al. (8 authors)
📅 2026-03-30
Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work,...
45
Navigating the Mirage: A Dual-Path Agentic Framework for Robust Misleading Chart Question Answering
Yanjie Zhang, Yafei Li, Rui Sheng et al. (8 authors)
📅 2026-03-30
Despite the success of Vision-Language Models (VLMs), misleading charts remain a significant challenge due to their deceptive visual structures and distorted data representations. We present ChartCynics, an agentic dual-path framework designed to unmask visual deception via a "skeptical" reasoning paradigm. Unlike holistic models, ChartCynics decouples perception from verification: a...
46
CirrusBench: Evaluating LLM-based Agents Beyond Correctness in Real-World Cloud Service Environments
Yi Yu, Guangquan Hu, Chenghuang Shen et al. (18 authors)
📅 2026-03-30
The increasing agentic capabilities of Large Language Models (LLMs) have enabled their deployment in real-world applications, such as cloud services, where customer-assistant interactions exhibit high technical complexity and long-horizon dependencies, making robustness and resolution efficiency critical for customer satisfaction. However, existing benchmarks for LLM-based agents largely rely on...
47
Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems
Iman Sharifi, Alex Zongo, Peng Wei
📅 2026-03-30
The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments, where both cooperative separation assurance and operational efficiency must...
48
T-Norm Operators for EU AI Act Compliance Classification: An Empirical Comparison of Lukasiewicz, Product, and Gödel Semantics in a Neuro-Symbolic Reasoning System
Adam Laabs
📅 2026-03-30
We present a first comparative pilot study of three t-norm operators -- Lukasiewicz (T_L), Product (T_P), and Gödel (T_G) - as logical conjunction mechanisms in a neuro-symbolic reasoning system for EU AI Act compliance classification. Using the LGGT+ (Logic-Guided Graph Transformers Plus) engine and a benchmark of 1035 annotated AI system descriptions spanning four risk categories (prohibited,...
49
Training data generation for context-dependent rubric-based short answer grading
Pavel Šindelář, Dávid Slivka, Christopher Bouma et al. (5 authors)
📅 2026-03-30
Every four years, the PISA test is administered by the OECD to test the knowledge of teenage students worldwide and allow for comparisons of educational systems. However, having to avoid language differences and annotator bias makes the grading of student answers challenging. For these reasons, it would be interesting to consider methods of automatic student answer grading. To train some of these...
50
GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum
Shuwen Xu, Yao Xu, Jiaxiang Liu et al. (7 authors)
📅 2026-03-30
Agentic knowledge graph question answering (KGQA) requires an agent to iteratively interact with knowledge graphs (KGs), posing challenges in both training data scarcity and reasoning generalization. Specifically, existing approaches often restrict agent exploration: prompting-based methods lack autonomous navigation training, while current training pipelines usually confine reasoning to...