← Back to Homepage

Retrieval-augmented Generation

检索增强生成研究

📊 50 Papers 📅 Updated: 2026-04-01
1
Covertly improving intelligibility with data-driven adaptations of speech timing
Paige Tuttösí, Angelica Lim, H. Henny Yeung et al. (5 authors)
📅 2026-03-31
Human talkers often address listeners with language-comprehension challenges, such as hard-of-hearing or non-native adults, by globally slowing down their speech. However, it remains unclear whether this strategy actually makes speech more intelligible. Here, we take advantage of recent advancements in machine-generated speech allowing more precise control of speech rate in order to...
2
Tracking Equivalent Mechanistic Interpretations Across Neural Networks
Alan Sun, Mariya Toneva
📅 2026-03-31
Mechanistic interpretability (MI) is an emerging framework for interpreting neural networks. Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains the model's decision process on that task. However, MI is difficult to scale and generalize. This stems in part from two key challenges: there is no precise notion of a valid interpretation;...
3
Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives
Mohammadhossein Khojasteh, Yifan Jiang, Stefano De Giorgis et al. (5 authors)
📅 2026-03-31
Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. Yet, analogies between narrative structures remain challenging for machines. Cognitive engines for structural mapping are not directly applicable, as they assume pre-extracted entities, whereas LLMs' performance is sensitive to prompt format and the degree of surface similarity between...
4
Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior
Junwei Yu, Mufeng Yang, Yepeng Ding et al. (4 authors)
📅 2026-03-31
The proliferation of AI-powered search engines has shifted information discovery from traditional link-based retrieval to direct answer generation with selective source citation, creating new challenges for content visibility. While existing Generative Engine Optimization (GEO) approaches focus primarily on semantic content modification, the role of structural features in influencing citation...
5
Towards Empowering Consumers through Sentence-level Readability Scoring in German ESG Reports
Benjamin Josef Schüßler, Jakob Prange
📅 2026-03-31
With the ever-growing urgency of sustainability in the economy and society, and the massive stream of information that comes with it, consumers need reliable access to that information. To address this need, companies began publishing so called Environmental, Social, and Governance (ESG) reports, both voluntarily and forced by law. To serve the public, these reports must be addressed not only to...
6
SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models
Adar Avsian, Larry Heck
📅 2026-03-31
Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction...
7
Cold-Starts in Generative Recommendation: A Reproducibility Study
Zhen Zhang, Jujia Zhao, Xinyu Ma et al. (6 authors)
📅 2026-03-31
Cold-start recommendation remains a central challenge in dynamic, open-world platforms, requiring models to recommend for newly registered users (user cold-start) and to recommend newly introduced items to existing users (item cold-start) under sparse or missing interaction signals. Recent generative recommenders built on pre-trained language models (PLMs) are often expected to mitigate...
8
Reasoning-Driven Synthetic Data Generation and Evaluation
Tim R. Davidson, Benoit Seguin, Enrico Bacis et al. (5 authors)
📅 2026-03-31
Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods...
9
Training-Free Dynamic Upcycling of Expert Language Models
Eros Fanì, Oğuzhan Ersoy
📅 2026-03-31
Large Language Models (LLMs) have achieved remarkable performance on a wide range of specialized tasks, exhibiting strong problem-solving capabilities. However, training these models is prohibitively expensive, and they often lack domain-specific expertise because they rely on general knowledge datasets. Expertise finetuning can address this issue; however, it often leads to overspecialization,...
10
Drift-Aware Continual Tokenization for Generative Recommendation
Yuebo Feng, Jiahao Liu, Mingzhe Han et al. (8 authors)
📅 2026-03-31
Generative recommendation commonly adopts a two-stage pipeline in which a learnable tokenizer maps items to discrete token sequences (i.e. identifiers) and an autoregressive generative recommender model (GRM) performs prediction based on these identifiers. Recent tokenizers further incorporate collaborative signals so that items with similar user-behavior patterns receive similar codes,...
11
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
Lixin Xiu, Xufang Luo, Hideki Nakayama
📅 2026-03-31
Large vision-language models (LVLMs) achieve impressive performance, yet their internal decision-making processes remain opaque, making it difficult to determine if the success stems from true multimodal fusion or from reliance on unimodal priors. To address this attribution gap, we introduce a novel framework using partial information decomposition (PID) to quantitatively measure the...
12
Agenda-based Narrative Extraction: Steering Pathfinding Algorithms with Large Language Models
Brian Felipe Keith-Norambuena, Carolina Inés Rojas-Córdova, Claudio Juvenal Meneses-Villegas et al. (7 authors)
📅 2026-03-31
Existing narrative extraction methods face a trade-off between coherence, interactivity, and multi-storyline support. Narrative Maps supports rich interaction and generates multiple storylines as a byproduct of its coverage constraints, though this comes at the cost of individual path coherence. Narrative Trails achieves high coherence through maximum capacity path optimization but provides no...
13
Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras
Sherif Abdelwahab
📅 2026-03-31
Always-on edge cameras generate continuous video streams where redundant frames degrade cross-modal retrieval by crowding correct results out of top-k search. This paper presents a streaming retrieval architecture: an on-device epsilon-net filter retains only semantically novel frames, building a denoised embedding index; a cross-modal adapter and cloud re-ranker compensate for the compact...
14
Convergent Representations of Linguistic Constructions in Human and Artificial Neural Systems
Pegah Ramezani, Thomas Kinfe, Andreas Maier et al. (5 authors)
📅 2026-03-31
Understanding how the brain processes linguistic constructions is a central challenge in cognitive neuroscience and linguistics. Recent computational studies show that artificial neural language models spontaneously develop differentiated representations of Argument Structure Constructions (ASCs), generating predictions about when and how construction-level information emerges during processing....
15
FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration
Qiyao Wang, Hongbo Wang, Longze Chen et al. (9 authors)
📅 2026-03-31
Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that treats literature exploration and idea generation as a co-evolving process....
16
Can LLM Agents Identify Spoken Dialects like a Linguist?
Tobias Bystrich, Lukas Hamm, Maria Hassan et al. (6 authors)
📅 2026-03-31
Due to the scarcity of labeled dialectal speech, audio dialect classification is a challenging task for most languages, including Swiss German. In this work, we explore the ability of large language models (LLMs) as agents in understanding the dialects and whether they can show comparable performance to models such as HuBERT in dialect classification. In addition, we provide an LLM baseline and a...
17
Impact of enriched meaning representations for language generation in dialogue tasks: A comprehensive exploration of the relevance of tasks, corpora and metrics
Alain Vázquez, Maria Inés Torres
📅 2026-03-31
Conversational systems should generate diverse language forms to interact fluently and accurately with users. In this context, Natural Language Generation (NLG) engines convert Meaning Representations (MRs) into sentences, directly influencing user perception. These MRs usually encode the communicative function (e.g., inform, request, confirm) via DAs and enumerate the semantic content with...
18
LLM Probe: Evaluating LLMs for Low-Resource Languages
Hailay Kidu Teklehaymanot, Gebrearegawi Gebremariam, Wolfgang Nejdl
📅 2026-03-31
Despite rapid advances in large language models (LLMs), their linguistic abilities in low-resource and morphologically rich languages are still not well understood due to limited annotated resources and the absence of standardized evaluation frameworks. This paper presents LLM Probe, a lexicon-based assessment framework designed to systematically evaluate the linguistic skills of LLMs in...
19
Calibrated Confidence Expression for Radiology Report Generation
David Bani-Harouni, Chantal Pellegrini, Julian Lüers et al. (9 authors)
📅 2026-03-31
Safe deployment of Large Vision-Language Models (LVLMs) in radiology report generation requires not only accurate predictions but also clinically interpretable indicators of when outputs should be thoroughly reviewed, enabling selective radiologist verification and reducing the risk of hallucinated findings influencing clinical decisions. One intuitive approach to this is verbalized confidence,...
20
Authorship Impersonation via LLM Prompting does not Evade Authorship Verification Methods
Baoyi Zeng, Andrea Nini
📅 2026-03-31
Authorship verification (AV), the task of determining whether a questioned text was written by a specific individual, is a critical part of forensic linguistics. While manual authorial impersonation by perpetrators has long been a recognized threat in historical forensic cases, recent advances in large language models (LLMs) raise new challenges, as adversaries may exploit these tools to...
21
PRISM: PRIor from corpus Statistics for topic Modeling
Tal Ishon, Yoav Goldberg, Uri Shaham
📅 2026-03-31
Topic modeling seeks to uncover latent semantic structure in text, with LDA providing a foundational probabilistic framework. While recent methods often incorporate external knowledge (e.g., pre-trained embeddings), such reliance limits applicability in emerging or underexplored domains. We introduce \textbf{PRISM}, a corpus-intrinsic method that derives a Dirichlet parameter from word...
22
Open Machine Translation for Esperanto
Ona de Gibert, Lluís de Gibert
📅 2026-03-31
Esperanto is a widespread constructed language, known for its regular grammar and productive word formation. Besides having substantial resources available thanks to its online community, it remains relatively underexplored in the context of modern machine translation (MT) approaches. In this work, we present the first comprehensive evaluation of open-source MT systems for Esperanto, comparing...
23
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Zhuowen Liang, Xiaotian Lin, Zhengxuan Zhang et al. (6 authors)
📅 2026-03-31
Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support reliable, verifiable QA. We propose a two-pillar framework, LiteCoST, to achieve...
24
SyriSign: A Parallel Corpus for Arabic Text to Syrian Arabic Sign Language Translation
Mohammad Amer Khalil, Raghad Nahas, Ahmad Nassar et al. (4 authors)
📅 2026-03-31
Sign language is the primary approach of communication for the Deaf and Hard-of-Hearing (DHH) community. While there are numerous benchmarks for high-resource sign languages, low-resource languages like Arabic remain underrepresented. Currently, there is no publicly available dataset for Syrian Arabic Sign Language (SyArSL). To overcome this gap, we introduce SyriSign, a dataset comprising 1500...
25
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
Lukuang Dong, Ziwei Li, Saierdaer Yusuyin et al. (5 authors)
📅 2026-03-31
Phoneme-based ASR factorizes recognition into speech-to-phoneme (S2P) and phoneme-to-grapheme (P2G), enabling cross-lingual acoustic sharing while keeping language-specific orthography in a separate module. While large language models (LLMs) are promising for P2G, multilingual P2G remains challenging due to language-aware generation and severe cross-language data imbalance. We study multilingual...
26
Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems
Zhiqian Zhang, Xu Zhao, Xiaoqing Xu et al. (8 authors)
📅 2026-03-31
In recent years, multimodal large models have continued to improve on general benchmarks. However, in real-world content moderation and adversarial settings, mainstream models still suffer from degraded generalization and catastrophic forgetting because of limited fine-grained visual perception and insufficient modeling of long-tail noise. In this paper, we present Xuanwu VL-2B as a case study of...
27
Kwame 2.0: Human-in-the-Loop Generative AI Teaching Assistant for Large Scale Online Coding Education in Africa
George Boateng, Samuel Boateng, Victor Kumbol
📅 2026-03-31
Providing timely and accurate learning support in large-scale online coding courses is challenging, particularly in resource-constrained contexts. We present Kwame 2.0, a bilingual (English-French) generative AI teaching assistant built using retrieval-augmented generation and deployed in a human-in-the-loop forum within SuaCode, an introductory mobile-based coding course for learners across...
28
Designing FSMs Specifications from Requirements with GPT 4.0
Omer Nguena Timo, Paul-Alexis Rodriguez, Florent Avellaneda
📅 2026-03-31
Finite state machines (FSM) are executable formal specifications of reactive systems. These machines are designed based on systems' requirements. The requirements are often recorded in textual documents written in natural languages. FSMs play a crucial role in different phases of the model-driven system engineering (MDE). For example, they serve to automate testing activities. FSM quality is...
29
APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay
Pratyay Banerjee, Masud Moshtaghi, Ankit Chadha
📅 2026-03-31
LLM-based autonomous agents lack persistent procedural memory: they re-derive solutions from scratch even when structurally identical tasks have been solved before. We present \textbf{APEX-EM}, a non-parametric online learning framework that accumulates, retrieves, and reuses structured procedural plans without modifying model weights. APEX-EM introduces: (1) a \emph{structured experience...
30
Dual Perspectives in Emotion Attribution: A Generator-Interpreter Framework for Cross-Cultural Analysis of Emotion in LLMs
Aizirek Turdubaeva, Uichin Lee
📅 2026-03-30
Large language models (LLMs) are increasingly used in cross-cultural systems to understand and adapt to human emotions, which are shaped by cultural norms of expression and interpretation. However, prior work on emotion attribution has focused mainly on interpretation, overlooking the cultural background of emotion generators. This assumption of universality neglects variation in how emotions are...
31
An Empirical Recipe for Universal Phone Recognition
Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi et al. (7 authors)
📅 2026-03-30
Phone recognition (PR) is a key enabler of multilingual and low-resource speech processing tasks, yet robust performance remains elusive. Highly performant English-focused models do not generalize across languages, while multilingual models underutilize pretrained representations. It also remains unclear how data scale, architecture, and training objective contribute to multilingual PR. We...
32
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
Yubo Li, Lu Zhang, Tianchong Jiang et al. (5 authors)
📅 2026-03-30
Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal,...
33
Known Intents, New Combinations: Clause-Factorized Decoding for Compositional Multi-Intent Detection
Abhilash Nandy
📅 2026-03-30
Multi-intent detection papers usually ask whether a model can recover multiple intents from one utterance. We ask a harder and, for deployment, more useful question: can it recover new combinations of familiar intents? Existing benchmarks only weakly test this, because train and test often share the same broad co-occurrence patterns. We introduce CoMIX-Shift, a controlled benchmark built to...
34
CrossTrace: A Cross-Domain Dataset of Grounded Scientific Reasoning Traces for Hypothesis Generation
Andrew Bouras, OMS-II Research Fellow
📅 2026-03-30
Scientific hypothesis generation is a critical bottleneck in accelerating research, yet existing datasets for training and evaluating hypothesis-generating models are limited to single domains and lack explicit reasoning traces connecting prior knowledge to novel contributions. I introduce CrossTrace, a dataset of 1,389 grounded scientific reasoning traces spanning biomedical research (518),...
35
Calibrated Fusion for Heterogeneous Graph-Vector Retrieval in Multi-Hop QA
Andre Bacellar
📅 2026-03-30
Graph-augmented retrieval combines dense similarity with graph-based relevance signals such as Personalized PageRank (PPR), but these scores have different distributions and are not directly comparable. We study this as a score calibration problem for heterogeneous retrieval fusion in multi-hop question answering. Our method, PhaseGraph, maps vector and graph scores to a common unit-free scale...
36
Adaptive Block-Scaled Data Types
Jack Cook, Hyemin S. Lee, Kathryn Le et al. (7 authors)
📅 2026-03-30
NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distribution, resulting in large amounts of quantization error on near-maximal values in...
37
SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning
Philip Schroeder, Thomas Weng, Karl Schmeckpeper et al. (6 authors)
📅 2026-03-30
Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enabling policies to exploit perceptual errors rather than solve the task. To...
38
OneComp: One-Line Revolution for Generative AI Model Compression
Yuma Ichikawa, Keiji Kimura, Akihiro Yoshida et al. (14 authors)
📅 2026-03-30
Deploying foundation models is increasingly constrained by memory footprint, latency, and hardware costs. Post-training compression can mitigate these bottlenecks by reducing the precision of model parameters without significantly degrading performance; however, its practical implementation remains challenging as practitioners navigate a fragmented landscape of quantization algorithms, precision...
39
The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle
Lara Russell-Lasalandra, Hudson Golino, Luis Eduardo Garrido et al. (4 authors)
📅 2026-03-30
Psychological scale development has traditionally required extensive expert involvement, iterative revision, and large-scale pilot testing before psychometric evaluation can begin. The `AIGENIE` R package implements the AI-GENIE framework (Automatic Item Generation with Network-Integrated Evaluation), which integrates large language model (LLM) text generation with network psychometric methods to...
40
Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model
Athos Georgiou
📅 2026-03-30
Visual document understanding typically requires separate retrieval and generation models, doubling memory and system complexity. We present Hydra, a dual-head approach that provides both ColBERT-style late-interaction retrieval and autoregressive generation from a single vision-language model (VLM). A single LoRA adapter, trained only for retrieval, is toggled at inference: enabling it produces...
41
Training data generation for context-dependent rubric-based short answer grading
Pavel Šindelář, Dávid Slivka, Christopher Bouma et al. (5 authors)
📅 2026-03-30
Every four years, the PISA test is administered by the OECD to test the knowledge of teenage students worldwide and allow for comparisons of educational systems. However, having to avoid language differences and annotator bias makes the grading of student answers challenging. For these reasons, it would be interesting to consider methods of automatic student answer grading. To train some of these...
42
GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum
Shuwen Xu, Yao Xu, Jiaxiang Liu et al. (7 authors)
📅 2026-03-30
Agentic knowledge graph question answering (KGQA) requires an agent to iteratively interact with knowledge graphs (KGs), posing challenges in both training data scarcity and reasoning generalization. Specifically, existing approaches often restrict agent exploration: prompting-based methods lack autonomous navigation training, while current training pipelines usually confine reasoning to...
43
EarlySciRev: A Dataset of Early-Stage Scientific Revisions Extracted from LaTeX Writing Traces
Léane Jourdan, Julien Aubert-Béduchaud, Yannis Chupin et al. (5 authors)
📅 2026-03-30
Scientific writing is an iterative process that generates rich revision traces, yet publicly available resources typically expose only final or near-final versions of papers. This limits empirical study of revision behaviour and evaluation of large language models (LLMs) for scientific writing. We introduce EarlySciRev, a dataset of early-stage scientific text revisions automatically extracted...
44
Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification
Masnun Nuha Chowdhury, Nusrat Jahan Beg, Umme Hunny Khan et al. (6 authors)
📅 2026-03-30
Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured,...
45
Entropic Claim Resolution: Uncertainty-Driven Evidence Selection for RAG
Davide Di Gioia
📅 2026-03-30
Current Retrieval-Augmented Generation (RAG) systems predominantly rely on relevance-based dense retrieval, sequentially fetching documents to maximize semantic similarity with the query. However, in knowledge-intensive and real-world scenarios characterized by conflicting evidence or fundamental query ambiguity, relevance alone is insufficient for resolving epistemic uncertainty. We introduce...
46
Structural-Ambiguity-Aware Translation from Natural Language to Signal Temporal Logic
Kosei Fushimi, Kazunobu Serizawa, Junya Ikemoto et al. (4 authors)
📅 2026-03-30
Signal Temporal Logic (STL) is widely used to specify timed and safety-critical tasks for cyber-physical systems, but writing STL formulas directly is difficult for non-expert users. Natural language (NL) provides a convenient interface, yet its inherent structural ambiguity makes one-to-one translation into STL unreliable. In this paper, we propose an \textit{ambiguity-preserving} method for...
47
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome
Fangda Ye, Yuxin Hu, Pengxiang Zhu et al. (22 authors)
📅 2026-03-30
Recent progress in deep research systems has been impressive, but evaluation still lags behind real user needs. Existing benchmarks predominantly assess final reports using fixed rubrics, failing to evaluate the underlying research process. Most also offer limited multimodal coverage, rely on synthetic tasks that do not reflect real-world query complexity, and cannot be refreshed as knowledge...
48
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
He Du, Qiming Ge, Jiakai Hu et al. (21 authors)
📅 2026-03-30
We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured...
49
\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language
Verena Platzgummer, John McCrae, Sina Ahmadi
📅 2026-03-30
The design of Large Language Models and generative artificial intelligence has been shown to be "unfair" to less-spoken languages and to deepen the digital language divide. Critical sociolinguistic work has also argued that these technologies are not only made possible by prior socio-historical processes of linguistic standardisation, often grounded in European nationalist and colonial...
50
From Reviews to Requirements: Can LLMs Generate Human-Like User Stories?
Shadman Sakib, Oishy Fatema Akhand, Tasnia Tasneem et al. (4 authors)
📅 2026-03-30
App store reviews provide a constant flow of real user feedback that can help improve software requirements. However, these reviews are often messy, informal, and difficult to analyze manually at scale. Although automated techniques exist, many do not perform well when replicated and often fail to produce clean, backlog-ready user stories for agile projects. In this study, we evaluate how well...