← Back to Homepage

Reinforced Learning

强化学习与人类反馈研究

📊 50 Papers 📅 Updated: 2026-05-14
1
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
Tara Bogavelli, Gabrielle Gauthier Melançon, Katrina Stankiewicz et al. (13 authors)
📅 2026-05-13
Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges: generating realistic simulated conversations, and measuring quality across the full scope of voice-specific failure modes. We present EVA-Bench, an end-to-end...
2
What is Learnable in Valiant's Theory of the Learnable?
Steve Hanneke, Anay Mehrotra, Grigoris Velegkas et al. (4 authors)
📅 2026-05-13
Valiant's 1984 paper is widely credited with introducing the PAC learning model, but it, in fact, introduced a different model: unlike PAC learning, the learner receives only positives, may issue membership queries, and must output a hypothesis with no false positives. Prior work characterized variants, including the case without queries. We revisit Valiant's original model and ask:...
3
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
Zijie Wu, Lixin Xu, Puhua Jiang et al. (6 authors)
📅 2026-05-13
Video-guided 3D animation holds immense potential for content creation, offering intuitive and precise control over dynamic assets. However, practical deployment faces a critical yet frequently overlooked hurdle: the pose misalignment dilemma. In real-world scenarios, the initial pose of a user-provided static mesh rarely aligns with the starting frame of a reference video. Naively forcing a mesh...
4
Topology-Preserving Neural Operator Learning via Hodge Decomposition
Dongzhe Zheng, Tao Zhong, Christine Allen-Blanchette
📅 2026-05-13
In this paper, we study solution operators of physical field equations on geometric meshes from a function-space perspective. We reveal that Hodge orthogonality fundamentally resolves spectral interference by isolating unlearnable topological degrees of freedom from learnable geometric dynamics, enabling an additive approximation confined to structure-preserving subspaces. Building on Hodge...
5
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling
Hoang-Quan Nguyen, Sankalp Pandey, Khoa Luu
📅 2026-05-13
Modeling long-range dependencies in sequential data remains a central challenge in machine learning. Transformers address this challenge through attention mechanisms, but their quadratic complexity with respect to sequence length limits scalability to long contexts. State-space models (SSMs) provide an efficient alternative with linear-time computation by evolving a latent state through recurrent...
6
Quantifying Sensitivity for Tree Ensembles: A symbolic and compositional approach
S. Akshay, Chaitanya Garg, Ashutosh Gupta et al. (5 authors)
📅 2026-05-13
Decision tree ensembles (DTE) are a popular model for a wide range of AI classification tasks, used in multiple safety critical domains, and hence verifying properties on these models has been an active topic of study over the last decade. One such verification question is the problem of sensitivity, which asks, given a DTE, whether a small change in subset of features can lead to...
7
Negation Neglect: When models fail to learn negations in training
Harry Mayne, Lev McKinney, Jan Dubiński et al. (6 authors)
📅 2026-05-13
We introduce Negation Neglect, where finetuning LLMs on documents that flag a claim as false makes them believe the claim is true. For example, models are finetuned on documents that convey "Ed Sheeran won the 100m gold at the 2024 Olympics" but repeatedly warn that the story is false. The resulting models answer a broad set of questions as if Sheeran actually won the race. This occurs...
8
Reducing cross-sample prediction churn in scientific machine learning
Gordan Prastalo, Kevin Maik Jablonka
📅 2026-05-13
Scientific machine learning reports predictive performance. It does not report whether the same prediction would survive a different draw of training data. Across $9$ chemistry benchmarks, two classifiers trained on independent bootstraps of the same training set agree on aggregate accuracy to within $1.3\text{--}4.2$ percentage points but disagree on the class label of $8.0\text{--}21.8\%$ of...
9
Harnessing Agentic Evolution
Jiayi Zhang, Yongfeng Gu, Jianhao Ruan et al. (13 authors)
📅 2026-05-13
Agentic evolution has emerged as a powerful paradigm for improving programs, workflows, and scientific solutions by iteratively generating candidates, evaluating them, and using feedback to guide future search. However, existing methods are typically instantiated either as fixed hand-designed procedures that are modular but rigid, or as general-purpose agents that flexibly integrate feedback but...
10
Uncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning Fusion
Nikolaos Tsalkitzis, Panagiotis P. Filntisis, Petros Maragos et al. (4 authors)
📅 2026-05-13
Digital phenotyping enables continuous passive monitoring of behavior and physiology, offering a promising paradigm for early detection of psychotic relapse. In this work, we develop and systematically study two smartwatch-based frameworks for daily relapse detection. The first forecasts cardiac dynamics and flags deviations between predicted and observed features as indicators of abnormality....
11
Provable Quantization with Randomized Hadamard Transform
Ying Feng, Piotr Indyk, Michael Kapralov et al. (5 authors)
📅 2026-05-13
Vector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $Θ(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its...
12
Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo
Ejaaz Merali, Mohamed Hibat-Allah, Mohammad Kohandel et al. (5 authors)
📅 2026-05-13
Neural-network quantum states have emerged as a powerful variational framework for quantum many-body systems, with recent progress often driven by massively parallel architectures such as transformers. Recurrent neural network quantum states, however, are frequently regarded as intrinsically sequential and therefore less scalable. Here we revisit this view by showing that modern recurrent...
13
Min-Max Optimization Requires Exponentially Many Queries
Martino Bernasconi, Matteo Castiglioni, Andrea Celli et al. (4 authors)
📅 2026-05-13
We study the query complexity of min-max optimization of a nonconvex-nonconcave function $f$ over $[0,1]^d \times [0,1]^d$. We show that, given oracle access to $f$ and to its gradient $\nabla f$, any algorithm that finds an $\varepsilon$-approximate stationary point must make a number of queries that is exponential in $1/\varepsilon$ or $d$.
14
Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling
Deepak Pandita, Flip Korn, Chris Welty et al. (4 authors)
📅 2026-05-13
As generative AI models such as large language models (LLMs) become more pervasive, ensuring the safety, robustness, and overall trustworthiness of these systems is paramount. However, AI is currently facing a reproducibility crisis driven by unreliable evaluations and unrepeatable experimental results. While human raters are often used to assess models for utility and safety, they introduce...
15
Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations
Zhonghao Li, Chaoyu Liu, Qian Zhang
📅 2026-05-13
Partial differential equations (PDEs) are fundamental for modeling complex natural and physical phenomena. In many real-world applications, however, observational data are extremely sparse, which severely limits the applicability of both classical numerical solvers and existing neural approaches. While neural methods have shown promising results under moderately sparse observations, their...
16
ENSEMBITS: an alphabet of protein conformational ensembles
Kaiwen Shi, Carlos Oliver
📅 2026-05-13
Protein structure tokenizers (PSTs) are workhorses in protein language modeling, function prediction, and evolutionary analysis. However, existing PSTs only capture local geometry of static structures, and miss the correlated motions and alternative conformational states revealed by protein ensembles. Here we introduce Ensembits, the first tokenizer of protein conformational ensembles. Ensembits...
17
Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs
Eszter Varga-Umbrich, Zachary Weller-Davies, Paul Duckworth et al. (6 authors)
📅 2026-05-13
Active learning for machine-learning interatomic potentials (MLIPs) must address several challenges to be practical: scaling to large candidate pools, leveraging energy-force supervision, and maintaining robustness when candidate pools are biased relative to the target distribution. In this work, we jointly address these challenges. We first introduce a linearly scaling acquisition framework...
18
Interpretable Machine Learning for Antepartum Prediction of Pregnancy-Associated Thrombotic Microangiopathy Using Routine Longitudinal Laboratory Data
Chuanchuan Sun, Zhen Yu, Qin Fan et al. (5 authors)
📅 2026-05-13
Background: Pregnancy-associated thrombotic microangiopathy (P-TMA) is rare but life-threatening. Early risk prediction before overt clinical presentation remains challenging, as the associated laboratory abnormalities are subtle, multidimensional, and frequently masked by common physiological changes such as gestational thrombocytopenia and pregnancy-related proteinuria, thus overlapping heavily...
19
Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers
Victor Norgren
📅 2026-05-13
Conventional transformer inference engines are request-driven, paying an O(n) prefill cost on every query. In streaming workloads, where data arrives continuously and queries probe an ever-growing context, this cost is prohibitive. We introduce a data-driven computational model centred on stateful sessions: a persistent KV cache advanced incrementally as new data arrives, so prefill is moved off...
20
MinT: Managed Infrastructure for Training and Serving Millions of LLMs
Mind Lab, :, Song Cao et al. (63 authors)
📅 2026-05-13
We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions...
21
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching
Abdalrahman Wael
📅 2026-05-13
We study dense and mixture-of-experts (MoE) transformers in a tiny-scale pretraining regime under a shared LLaMA-style decoder training recipe. The sparse model replaces dense feed-forward blocks with Mixtral-style routed experts. Dense baselines are modestly width-resized to tightly match either active or total parameter budgets, while tokenizer, data, optimizer, schedule, depth, context length,...
22
High-Rate Quantized Matrix Multiplication II
Or Ordentlich, Yury Polyanskiy
📅 2026-05-13
This is the second part of the work investigating quantized matrix multiplication (MatMul). In part I we considered the case of calibration-free quantization, whereas here we discuss the setting where covariance matrix $Σ_X$ of the columns of the second factor is available. This setting arises in the ubiquitous task of weight-only post-training quantization of LLMs. Weight-only quantization is...
23
VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense
Jascha Wanger
📅 2026-05-13
Modern retrieval-augmented generation (RAG) systems convert sensitive content into high-dimensional embeddings and store them in vector databases that treat the resulting numerical artifacts as opaque. Major vector-store products do not provide native controls for embedding integrity, ingestion-time distributional anomaly detection, or cryptographic provenance attestation. We show this opens a...
24
Toward AI-Driven Digital Twins for Metropolitan Floods: A Conditional Latent Dynamics Network Surrogate of the Shallow Water Equations
Phillip Si, Yuan Qiu, Omar Sallam et al. (7 authors)
📅 2026-05-13
AI-driven flood digital twins demand fast hydrodynamic surrogates for ensemble forecasting and observation assimilation. Yet even GPU-accelerated two-dimensional shallow water equation (SWE) solvers still require $\sim 55$ minutes per $96$-hour run on a $\sim 4.2$-million-active-cell metropolitan basin (the Des~Plaines River basin at $30\,\mathrm{m}$ resolution), making such workloads prohibitive...
25
Fast and effective algorithms for fair clustering at scale
Claudio Mantuano, Manuel Kammermann, Philipp Baumann
📅 2026-05-13
Clustering is an unsupervised machine learning task that consists of identifying groups of similar objects. It has numerous applications and is increasingly used in fairness-sensitive domains where objects represent individuals, such as customers, employees, or students. We address a fair clustering problem in which objects belong to protected groups. The problem consists of partitioning the...
26
Min Generalized Sliced Gromov Wasserstein: A Scalable Path to Gromov Wasserstein
Ashkan Shahbazi, Xinran Liu, Ping He et al. (4 authors)
📅 2026-05-13
We propose min Generalized Sliced Gromov--Wasserstein (min-GSGW), a sliced formulation for the Gromov--Wasserstein (GW) problem using expressive generalized slicers. The key idea is to learn coupled nonlinear slicers that assign compatible push-forward values to both input measures, so that monotone coupling in the projected domain lifts to a transport plan evaluated against the GW objective in...
27
GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction
Yifan Duan, Siyuan Zheng, Lihuan Li et al. (5 authors)
📅 2026-05-13
Open datasets and benchmarks for entity-level carbon-emission prediction remain fragmented across access, scale, granularity, and evaluation. We introduce GHGbench, an open dataset and benchmark for company- and building-level greenhouse-gas prediction. The company track contains 32,000+ company-year records from 12,000+ firms with Scope 1+2 and Scope 3 disclosures and financial/sectoral signals;...
28
Learning POMDP World Models from Observations with Language-Model Priors
Valentin Six, Frederik Panse, Mathis Fajeau et al. (10 authors)
📅 2026-05-13
Whether navigating a building, operating a robot, or playing a game, an agent that acts effectively in an environment must first learn an internal model of how that environment works. Partially-observable Markov decision processes (POMDPs) provide a flexible modeling class for such internal world models, but learning them from observation-action trajectories alone is challenging and typically...
29
Distinguishing performance gains from learning when using generative AI
Lixiang Yan, Samuel Greiff, Jason M. Lodge et al. (4 authors)
📅 2026-05-13
Generative artificial intelligence (AI) is increasingly being integrated into education, where it can boost learners' performance. However, these uses do not promote the deep cognitive and metacognitive processing that are required for high-quality learning.
30
Robust and Explainable Bicuspid Aortic Valve Diagnosis Using Stacked Ensembles on Echocardiography
Christos Chrysanthos Nikolaidis, Vasileios Sachpekidis, Nikolas Moustakidis et al. (5 authors)
📅 2026-05-13
Transthoracic echocardiography (TTE) is the first-line imaging modality for diagnosing bicuspid aortic valve (BAV), yet diagnostic performance varies with operator expertise and image quality. We developed an explainable AI model that distinguishes BAV from tricuspid aortic valves (TAV) using routinely acquired parasternal long-axis (PLAX) cine loops. A multi-backbone video ensemble was trained...
31
Humanwashing -- It Should Leave You Feeling Dirty
Ben Wilson, Matimba Swana, Peter Winter et al. (4 authors)
📅 2026-05-13
The phrase 'human in the loop' is increasingly used to imply a sense of safety in relation to AI decision systems. It shouldn't. There are contexts where it can be applied appropriately, but these are not in the deployed decision systems we see dominating today. Human oversight of AI decision processes is one of the most popular proposals for addressing concerns, especially about...
32
Tight Sample Complexity Bounds for Entropic Best Policy Identification
Amer Essakine, Claire Vernade
📅 2026-05-13
We study best-policy identification for finite-horizon risk-sensitive reinforcement learning under the entropic risk measure. Recent work established a constant gap in the exponential horizon dependence between lower and upper bounds on the number of samples required to identify an approximately optimal policy. Precisely, known lower bounds scale in $Ω(e^{|β| H})$ where $H$ is the horizon of the...
33
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
Hsing-Huan Chung, Shijun Li, Yoav Wald et al. (6 authors)
📅 2026-05-13
Multimodal irregular time series (MITS) consist of asynchronous and irregularly sampled observations from heterogeneous numerical and textual channels. In healthcare, for example, patients' electronic health records (EHR) include irregular lab measurements and clinical notes. The irregular timing and channel patterns of observations carry predictive signal alongside the numerical values and...
34
Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety
Qian Shen, Fanghua Cao, Min Yao et al. (6 authors)
📅 2026-05-13
Large Language Models (LLMs) are widely applied in educational practices, such as for generating children's stories. However, the generated stories are often too difficult for children to read, and the operational cost of LLMs hinders their widespread adoption in educational settings. We used an existing expert-designed children's reading curriculum and its corresponding generated...
35
DisAgg: Distributed Aggregators for Efficient Secure Aggregation in Federated Learning
Haaris Mehmood, Giorgos Tatsis, Dimitrios Alexopoulos et al. (7 authors)
📅 2026-05-13
Federated learning enables collaborative model training across distributed clients, yet vanilla FL exposes client updates to the central server. Secure-aggregation schemes protect privacy against an honest-but-curious server, but existing approaches often suffer from many communication rounds, heavy public-key operations, or difficulty handling client dropouts. Recent methods like One-Shot...
36
Polyhedral Instability Governs Regret in Online Learning
Yuetai Li, Fengqing Jiang, Yichen Feng et al. (9 authors)
📅 2026-05-13
Many online decision problems over combinatorial actions are addressed via convex relaxations, leading to online convex optimization with piecewise linear objectives and induced polyhedral structure. We show that regret in such problems is governed by \emph{polyhedral instability}: the number of changes of the active region. Under full information feedback and fixed partition assumptions, if...
37
The WidthWall: A Strict Expressivity Hierarchy for Hypergraph Neural Networks
Fengqing Jiang, Yuetai Li, Yichen Feng et al. (9 authors)
📅 2026-05-13
Hypergraphs provide a natural framework to model higher-order interactions in scientific, social, and biological systems. Hypergraph neural networks (HGNNs) aim to learn from such data, yet it remains unclear which higher-order structures these models can represent. We show that hypergraph expressivity is governed by which small patterns an architecture can detect and count. We formalize this via...
38
MedCore: Boundary-Preserving Medical Core Pruning for MedSAM
Cenwei Zhang, Suncheng Xiang, Lei You
📅 2026-05-13
Medical segmentation foundation models such as SAM and MedSAM provide strong prompt-driven segmentation, but their image encoders are still too large for many clinical settings. Compression is also risky in medicine because a model can keep high Dice while losing boundary fidelity. We propose MedCore, a structured pruning framework for MedSAM. The main idea is to preserve two kinds of structures:...
39
A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning
Jason Gaitonde, Frederic Koehler, Elchanan Mossel et al. (5 authors)
📅 2026-05-13
We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact $k$-gram ansatz} in place of transformers with context length $k$, a substitution we then validate empirically....
40
Scale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale
Shashaank Aiyer, Yishay Mansour, Shay Moran et al. (5 authors)
📅 2026-05-13
We study the optimal scale at which real-valued function classes exhibit uniform convergence and learnability. Our main result establishes a scale-sensitive generalization of the fundamental theorem of PAC learning: for every bounded real-valued class and every $γ>0$, uniform convergence at scale $γ$, agnostic learnability at scale $γ/2$, and finiteness of the fat-shattering dimension at every...
41
Sampling from Flow Language Models via Marginal-Conditioned Bridges
Iskander Azangulov, Leo Zhang
📅 2026-05-13
Flow Language Models (FLMs) are a recently introduced class of language models which adapt continuous flow matching for one-hot encoded token sequences. Their denoisers have a special structure absent from generic continuous diffusion models: each block of the denoising mean is a posterior marginal distribution over the clean token at that position. Standard DDPM-style samplers collapse these...
42
Three-Stage Learning Unlocks Strong Performance in Simple Models for Long-Term Time Series Forecasting
Zhenan Yu, Guangxin Jiang, Jin Yang
📅 2026-05-13
Recent studies on long-term time series forecasting have shown that simple linear models and MLP-based predictors can achieve strong performance without increasingly complex architectures. However, many competitive baselines still rely on structural priors such as frequency-domain modeling, explicit decomposition, multi-scale mixing, or sophisticated cross-variable interaction modules, while...
43
Characterizing Universal Object Representations Across Vision Models
Florian P. Mahner, Johannes Roth, Ka Chun Lam et al. (6 authors)
📅 2026-05-13
Deep neural networks trained with different architectures, objectives, and datasets have been reported to converge on similar visual representations. However, what remains unknown is which visual properties models actually converge on and which factors may underlie this convergence. To address this, we decompose the object similarity structure of 162 diverse vision models into a small set of...
44
Graph Neural Networks with Triangle-Based Messages for the Multicut Problem
Jannik Irmai, Lucas Fabian Naumann, Bjoern Andres
📅 2026-05-13
The multicut problem is an NP-hard combinatorial optimization problem with diverse applications in fields such as bioinformatics, data mining and computer vision. Graph neural networks have been defined for the multicut problem but can be adapted further to its specific objective function and constraints. In this article, we introduce such an adapted graph neural network architecture in which...
45
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
Namrata Shivagunde, Vijeta Deshpande, Sherin Muckatira et al. (4 authors)
📅 2026-05-13
Pre-training large language models is dominated by the memory cost of storing full-rank weights, gradients, and optimizer states. Low-rank pre-training has emerged to address this, and the space of methods has grown rapidly. A central question remains open: do low-rank methods produce models that generalize comparably to full-rank training, or does the rank constraint fundamentally alter the...
46
Conformal Anomaly Detection in Python: Moving Beyond Heuristic Thresholds with 'nonconform'
Oliver Hennhöfer, Maximilian Kirsch, Christine Preisach
📅 2026-05-13
Most anomaly detection systems output scores rather than calibrated decisions, leaving practitioners to choose thresholds heuristically and without clear statistical interpretation. Conformal anomaly detection addresses this limitation by converting anomaly scores into calibrated p-values that are valid under the statistical assumption of data exchangeability, with a growing literature extending...
47
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization
Yang Bai, Kaiyuan Liu, Ziyuan Zhuang et al. (8 authors)
📅 2026-05-13
Complex reinforcement learning environments frequently employ multi-task and mixed-reward formulations. In these settings, heterogeneous reward distributions and correlated reward dimensions often destabilize the construction of scalar advantages. To address these challenges, we propose Reward-Decorrelated Policy Optimization (RDPO), a reward-processing method designed to explicitly target both...
48
Achieving $ε^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions
Ishaq Hamza, Zaiwei Chen
📅 2026-05-13
In this paper, we establish last-iterate convergence rates for off-policy actor--critic methods in reinforcement learning. In particular, under a single-loop, single-timescale implementation and a broad class of policy updates, including approximate policy iteration and natural policy gradient methods, we prove the first $\tilde{\mathcal{O}}(ε^{-2})$ sample complexity guarantee for finding an...
49
CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem
Ankit Kulshrestha, Xiaoyuan Liu
📅 2026-05-13
A quantum compiler is a critical piece in the quantum computing pipeline since it allows an abstract quantum circuit to be run on a physical quantum computer. One extremely important subproblem in quantum compilation is the generation of a logical to physical qubit mapping. Typically in quantum compilers this step is either implemented as a random or a heuristic based assignment that aims to...
50
Multimodal Graph-based Classification of Esophageal Motility Disorders
Alexander Geiger, Lars Wagner, Daniel Rueckert et al. (6 authors)
📅 2026-05-13
Diagnosing esophageal motility disorders pose significant challenges due to the complexity of high-resolution impedance manometry (HRIM) data and variability in clinical interpretation. This work explores the feasibility of a multimodal Machine Learning (ML)-based classification approach that combines HRIM recordings with patient-specific information and incorporates a graph-based modeling of...