← Back to Homepage

Reinforced Learning

强化学习与人类反馈研究

📊 50 Papers 📅 Updated: 2026-03-18
1
Efficient Reasoning on the Edge
Yelysei Bondarenko, Thomas Hehn, Rob Hesselink et al. (18 authors)
📅 2026-03-17
Large language models (LLMs) with chain-of-thought reasoning achieve state-of-the-art performance across complex problem-solving tasks, but their verbose reasoning traces and large context requirements make them impractical for edge deployment. These challenges include high token generation costs, large KV-cache footprints, and inefficiencies when distilling reasoning capabilities into smaller...
2
ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K
Kaixuan Wang, Tianxing Chen, Jiawei Liu et al. (16 authors)
📅 2026-03-17
Learning in simulation provides a useful foundation for scaling robotic manipulation capabilities. However, this paradigm often suffers from a lack of data-generation-ready digital assets, in both scale and diversity. In this work, we present ManiTwin, an automated and efficient pipeline for generating data-generation-ready digital object twins. Our pipeline transforms a single image into...
3
Long-Horizon Traffic Forecasting via Incident-Aware Conformal Spatio-Temporal Transformers
Mayur Patil, Qadeer Ahmed, Shawn Midlam-Mohler et al. (7 authors)
📅 2026-03-17
Reliable multi-horizon traffic forecasting is challenging because network conditions are stochastic, incident disruptions are intermittent, and effective spatial dependencies vary across time-of-day patterns. This study is conducted on the Ohio Department of Transportation (ODOT) traffic count data and corresponding ODOT crash records. This work utilizes a Spatio-Temporal Transformer (STT) model...
4
GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators
Mattia Rigotti, Nicholas Thumiger, Thomas Frick
📅 2026-03-17
Adapting transformer positional encoding to meshes and graph-structured data presents significant computational challenges: exact spectral methods require cubic-complexity eigendecomposition and can inadvertently break gauge invariance through numerical solver artifacts, while efficient approximate methods sacrifice gauge symmetry by design. Both failure modes cause catastrophic generalization in...
5
Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning
Reek Das, Biplab Kanti Sen
📅 2026-03-17
Federated Learning (FL) is increasingly applied in sectors like healthcare, finance, and IoT, enabling collaborative model training while safeguarding user privacy. However, FL systems are susceptible to Byzantine adversaries that inject malicious updates, which can severely compromise global model performance. Existing defenses tend to focus on specific attack types and fail against untargeted...
6
Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning
Jello Zhou, Vudtiwat Ngampruetikorn, David J. Schwab
📅 2026-03-17
Stochastic resetting, where a dynamical process is intermittently returned to a fixed reference state, has emerged as a powerful mechanism for optimizing first-passage properties. Existing theory largely treats static, non-learning processes. Here we ask how stochastic resetting interacts with reinforcement learning, where the underlying dynamics adapt through experience. In tabular grid...
7
Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testing
Saksham Jain, Alex Luedtke
📅 2026-03-17
Beyond conditional average treatment effects, treatments may impact the entire outcome distribution in covariate-dependent ways, for example, by altering the variance or tail risks for specific subpopulations. We propose a novel estimand to capture such conditional distributional treatment effects, and develop a doubly robust estimator that is minimax optimal in the local asymptotic sense. Using...
8
Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights
Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar et al. (5 authors)
📅 2026-03-17
Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensive applications. Retrieval-augmented generation (RAG) and conformal factuality have emerged as potential ways to address this limitation. While RAG aims to ground responses in retrieved evidence, it provides no statistical guarantee that the final output is correct. Conformal factuality filtering...
9
RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation
Yixuan Huang, Jiawei Chen, Shengfan Zhang et al. (4 authors)
📅 2026-03-17
Collaborative filtering (CF) recommendation has been significantly advanced by integrating Graph Neural Networks (GNNs) and Graph Contrastive Learning (GCL). However, (i) random edge perturbations often distort critical structural signals and degrade semantic consistency across augmented views, and (ii) data sparsity hampers the propagation of collaborative signals, limiting generalization. To...
10
High-Dimensional Gaussian Mean Estimation under Realizable Contamination
Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas
📅 2026-03-17
We study mean estimation for a Gaussian distribution with identity covariance in $\mathbb{R}^d$ under a missing data scheme termed realizable $ε$-contamination model. In this model an adversary can choose a function $r(x)$ between 0 and $ε$ and each sample $x$ goes missing with probability $r(x)$. Recent work Ma et al., 2024 proposed this model as an intermediate-strength setting between Missing...
11
Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
Christian Belardi, Justin Lovelace, Kilian Q. Weinberger et al. (4 authors)
📅 2026-03-17
Guided diffusion sampling relies on approximating often intractable likelihood scores, which introduces significant noise into the sampling dynamics. We propose using adaptive moment estimation to stabilize these noisy likelihood scores during sampling. Despite its simplicity, our approach achieves state-of-the-art results on image restoration and class-conditional generation tasks, outperforming...
12
Conservative Continuous-Time Treatment Optimization
Nora Schneider, Georg Manten, Niki Kilbertus
📅 2026-03-17
We develop a conservative continuous-time stochastic control framework for treatment optimization from irregularly sampled patient trajectories. The unknown patient dynamics are modeled as a controlled stochastic differential equation with treatment as a continuous-time control. Naive model-based optimization can exploit model errors and propose out-of-support controls, so optimizing the...
13
SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit
Yibo Li, Qiongxiu Li
📅 2026-03-17
Gradient inversion attacks reveal that private training text can be reconstructed from shared gradients, posing a privacy risk to large language models (LLMs). While prior methods perform well in small-batch settings, scaling to larger batch sizes and longer sequences remains challenging due to severe signal mixing, high computational cost, and degraded fidelity. We present SOMP (Subspace-Guided...
14
pADAM: A Plug-and-Play All-in-One Diffusion Architecture for Multi-Physics Learning
Amirhossein Mollaali, Bongseok Kim, Christian Moya et al. (4 authors)
📅 2026-03-17
Generalizing across disparate physical laws remains a fundamental challenge for artificial intelligence in science. Existing deep-learning solvers are largely confined to single-equation settings, limiting transfer across physical regimes and inference tasks. Here we introduce pADAM, a unified generative framework that learns a shared probabilistic prior across heterogeneous partial differential...
15
A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems
Wei Min Loh, Sajib Kumer Sinha, Ankur Agarwal et al. (4 authors)
📅 2026-03-17
Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider...
16
Finding Common Ground in a Sea of Alternatives
Jay Chooi, Paul Gölz, Ariel D. Procaccia et al. (5 authors)
📅 2026-03-17
We study the problem of selecting a statement that finds common ground across diverse population preferences. Generative AI is uniquely suited for this task because it can access a practically infinite set of statements, but AI systems like the Habermas machine leave the choice of generated statement to a voting rule. What it means for this rule to find common ground, however, is not...
17
Probing Cultural Signals in Large Language Models through Author Profiling
Valentin Lafargue, Ariel Guerra-Adames, Emmanuelle Claeys et al. (5 authors)
📅 2026-03-17
Large language models (LLMs) are increasingly deployed in applications with societal impact, raising concerns about the cultural biases they encode. We probe these representations by evaluating whether LLMs can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gender and ethnicity without task-specific fine-tuning. Across several open-source models...
18
Data-driven forced response analysis with min-max representations of nonlinear restoring forces
Akira Saito, Hiromu Fujita
📅 2026-03-17
This paper discusses a novel data-driven nonlinearity identification method for mechanical systems with nonlinear restoring forces such as polynomial, piecewise-linear, and general displacement-dependent nonlinearities. The proposed method is built upon the universal approximation theorem that states that a nonlinear function can be approximated by a linear combination of activation functions in...
19
Bayesian Inference of Psychometric Variables From Brain and Behavior in Implicit Association Tests
Christian A. Kothe, Sean Mullen, Michael V. Bronstein et al. (10 authors)
📅 2026-03-17
Objective. We establish a principled method for inferring mental health related psychometric variables from neural and behavioral data using the Implicit Association Test (IAT) as the data generation engine, aiming to overcome the limited predictive performance (typically under 0.7 AUC) of the gold-standard D-score method, which relies solely on reaction times. Approach. We propose a sparse...
20
SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding
D. Darankoum, C. Habermacher, J. Volle et al. (4 authors)
📅 2026-03-17
Decoding the orchestration of neural activity in electroencephalography (EEG) signals is a central challenge in bridging neuroscience with artificial intelligence. Foundation models have made strides in generalized EEG decoding, yet many existing frameworks primarily relying on separate temporal and spectral masking of raw signals during self-supervised pretraining. Such strategies often tend to...
21
Understanding Quantization of Optimizer States in LLM Pre-training: Dynamics of State Staleness and Effectiveness of State Resets
Kristi Topollai, Anna Choromanska
📅 2026-03-17
Quantizing optimizer states is becoming an important ingredient of memory-efficient large-scale pre-training, but the resulting optimizer dynamics remain only partially understood. We study low-precision exponential moving average (EMA) optimizer states and show how quantization can cause many nominal updates to round back to the same stored value, making the state effectively stale and slowing...
22
GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems
Jia Ming Li, Anupriya, Daniel J. Graham
📅 2026-03-17
Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as...
23
The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models
Robert Welch, Emir Konuk, Kevin Smith
📅 2026-03-17
Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently...
24
Federated Learning with Multi-Partner OneFlorida+ Consortium Data for Predicting Major Postoperative Complications
Yuanfang Ren, Varun Sai Vemuri, Zhenhong Hu et al. (9 authors)
📅 2026-03-17
Background: This study aims to develop and validate federated learning models for predicting major postoperative complications and mortality using a large multicenter dataset from the OneFlorida Data Trust. We hypothesize that federated learning models will offer robust generalizability while preserving data privacy and security. Methods: This retrospective, longitudinal, multicenter cohort study...
25
Novelty-Driven Target-Space Discovery in Automated Electron and Scanning Probe Microscopy
Utkarsh Pratiush, Kamyar Barakati, Boris N. Slautin et al. (7 authors)
📅 2026-03-17
Modern automated microscopy faces a fundamental discovery challenge: in many systems, the most important scientific information does not reside in the immediately visible image features, but in the target space of sequentially acquired spectra or functional responses, making it essential to develop strategies that can actively search for new behaviors rather than simply optimize known objectives....
26
High-dimensional estimation with missing data: Statistical and computational limits
Kabir Aladin Verchand, Ankit Pensia, Saminul Haque et al. (4 authors)
📅 2026-03-17
We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an $ε$ fraction of the observations are subject to an arbitrary (and unknown) missing not at random (MNAR) mechanism. When the true data is Gaussian, we provide evidence...
27
Learning Lineage-guided Geodesics with Finsler Geometry
Aaron Zweig, Mingxuan Zhang, David A. Knowles et al. (4 authors)
📅 2026-03-17
Trajectory inference investigates how to interpolate paths between observed timepoints of dynamical systems, such as temporally resolved population distributions, with the goal of inferring trajectories at unseen times and better understanding system dynamics. Previous work has focused on continuous geometric priors, utilizing data-dependent spatial features to define a Riemannian metric. In many...
28
Cost Trade-offs in Matrix Inversion Updates for Streaming Outlier Detection
Florian Grivet, Louise Travé-Massuyès
📅 2026-03-17
Outlier detection identifies data points that deviate significantly from expected patterns, revealing anomalies that may require special attention. Incorporating online learning further improves accuracy by continuously updating the model to reflect the most recent data. When employing the Christoffel function as an outlier score, online learning requires updating the inverse of a matrix...
29
Grid-World Representations in Transformers Reflect Predictive Geometry
Sasha Brenner, Thomas R. Knösche, Nico Scherf
📅 2026-03-17
Next-token predictors often appear to develop internal representations of the latent world and its rules. The probabilistic nature of these models suggests a deep connection between the structure of the world and the geometry of probability distributions. In order to understand this link more precisely, we use a minimal stochastic process as a controlled setting: constrained random walks on a...
30
When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making
Jun Liu, Pu Zhao, Zhenglun Kong et al. (15 authors)
📅 2026-03-17
Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-level reasoning, planning, and decision-making during interactions with the environment. However, invoking LLM reasoning introduces substantial computational latency and resource overhead, which can interrupt action execution and reduce system reliability. Excessive reasoning may delay actions,...
31
Self-Aware Markov Models for Discrete Reasoning
Gregor Kornhardt, Jannis Chemseddine, Christian Wald et al. (4 authors)
📅 2026-03-17
Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is...
32
Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models
Xiaojie Gu, Sherry T. Tong, Aosong Feng et al. (11 authors)
📅 2026-03-17
Reasoning-focused large language models (LLMs) have advanced in many NLP tasks, yet their evaluation remains challenging: final answers alone do not expose the intermediate reasoning steps, making it difficult to determine whether a model truly reasons correctly and where failures occur, while existing multi-hop QA benchmarks lack step-level annotations for diagnosing reasoning failures. To...
33
Simplex-to-Euclidean Bijection for Conjugate and Calibrated Multiclass Gaussian Process
Bernardo Williams, Harsha Vardhan Tetali, Arto Klami et al. (4 authors)
📅 2026-03-17
We propose a conjugate and calibrated Gaussian process (GP) model for multi-class classification by exploiting the geometry of the probability simplex. Our approach uses Aitchison geometry to map simplex-valued class probabilities to an unconstrained Euclidean representation, turning classification into a GP regression problem with fewer latent dimensions than standard multi-class GP classifiers....
34
Trajectory-Optimized Time Reparameterization for Learning-Compatible Reduced-Order Modeling of Stiff Dynamical Systems
Joe Standridge, Daniel Livescu, Paul Cizmas
📅 2026-03-17
Stiff dynamical systems present a challenge for machine-learning reduced-order models (ML-ROMs), as explicit time integration becomes unstable in stiff regimes while implicit integration within learning loops is computationally expensive and often degrades training efficiency. Time reparameterization (TR) offers an alternative by transforming the independent variable so that rapid physical-time...
35
When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective
Zelin Zhang, Fei Cheng, Chenhui Chu
📅 2026-03-17
Although outcome-based reinforcement learning (RL) significantly advances the mathematical reasoning capabilities of Large Language Models (LLMs), its reliance on computationally expensive ground-truth annotations imposes a severe scalability bottleneck. Unsupervised RL guided by intrinsic rewards offers a scalable alternative, yet it suffers from opaque training dynamics and catastrophic...
36
REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models
Yong Zou, Haoran Li, Fanxiao Li et al. (8 authors)
📅 2026-03-17
Recent progress in image generation models (IGMs) enables high-fidelity content creation but also amplifies risks, including the reproduction of copyrighted content and the generation of offensive content. Image Generation Model Unlearning (IGMU) mitigates these risks by removing harmful concepts without full retraining. Despite growing attention, the robustness under adversarial inputs,...
37
Deep Tabular Representation Corrector
Hangting Ye, Peng Wang, Wei Fan et al. (7 authors)
📅 2026-03-17
Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. The recent success of deep learning has fostered many deep networks (e.g., Transformer, ResNet) based tabular learning methods. Generally, existing deep tabular machine learning methods are along with the two paradigms, i.e., in-learning and pre-learning. In-learning...
38
Manifold-Matching Autoencoders
Laurent Cheret, Vincent Létourneau, Isar Nejadgholi et al. (6 authors)
📅 2026-03-17
We study a simple unsupervised regularization scheme for autoencoders called Manifold-Matching (MMAE): we align the pairwise distances in the latent space to those of the input data space by minimizing mean squared error. Because alignment occurs on pairwise distances rather than coordinates, it can also be extended to a lower-dimensional representation of the data, adding flexibility to the...
39
Bridging the Simulation-to-Reality Gap in Electron Microscope Calibration via VAE-EM Estimation
Jilles S. van Hulst, W. P. M. H., Heemels et al. (4 authors)
📅 2026-03-17
Electron microscopy has enabled many scientific breakthroughs across multiple fields. A key challenge is the tuning of microscope parameters based on images to overcome optical aberrations that deteriorate image quality. This calibration problem is challenging due to the high-dimensional and noisy nature of the diagnostic images, and the fact that optimal parameters cannot be identified from a...
40
SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds
Viktor Stein, Wuchen Li, Gabriele Steidl
📅 2026-03-17
Transformers owe much of their empirical success in natural language processing to the self-attention blocks. Recent perspectives interpret attention blocks as interacting particle systems, whose mean-field limits correspond to gradient flows of interaction energy functionals on probability density spaces equipped with Wasserstein-$2$-type metrics. We extend this viewpoint by introducing...
41
An approximate graph elicits detonation lattice
Vansh Sharma, Venkat Raman
📅 2026-03-17
This study presents a novel algorithm based on graph theory for the precise segmentation and measurement of detonation cells from 3D pressure traces, termed detonation lattices, addressing the limitations of manual and primitive 2D edge detection methods prevalent in the field. Using a segmentation model, the proposed training-free algorithm is designed to accurately extract cellular patterns, a...
42
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
Zhenghang Song, Tang Qian, Lu Chen et al. (10 authors)
📅 2026-03-17
Structured data is foundational to healthcare, finance, e-commerce, and scientific data management. Large structured-data models (LDMs) extend the foundation model paradigm to unify heterogeneous datasets for tasks such as classification, regression, and decision support. However, existing LDMs face major limitations. First, most rely on sample-wise self-attention, whose O(N^2) complexity limits...
43
From the Inside Out: Progressive Distribution Refinement for Confidence Calibration
Xizhong Yang, Yinan Xia, Huiming Wang et al. (4 authors)
📅 2026-03-17
Leveraging the model's internal information as the self-reward signal in Reinforcement Learning (RL) has received extensive attention due to its label-free nature. While prior works have made significant progress in applying the Test-Time Scaling (TTS) strategies to RL, the discrepancy in internal information between test and training remains inadequately addressed. Moreover, Test-Time...
44
Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models
Subina Khanal, Seshu Tirupathi, Merim Dzaferagic et al. (5 authors)
📅 2026-03-17
Time series foundation models (TSFMs) require diverse, real-world datasets to adapt across varying domains and temporal frequencies. However, current large-scale datasets predominantly focus on low-frequency time series with sampling intervals, i.e., time resolution, in the range of seconds to years, hindering their ability to capture the nuances of high-frequency time series data. To address...
45
Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function
Amon Lahr, Anna Scampicchio, Johannes Köhler et al. (4 authors)
📅 2026-03-17
Non-conservative uncertainty bounds are essential for making reliable predictions about latent functions from noisy data--and thus, a key enabler for safe learning-based control. In this domain, kernel methods such as Gaussian process regression are established techniques, thanks to their inherent uncertainty quantification mechanism. Still, existing bounds either pose strong assumptions on the...
46
Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models
Rishaank Gupta
📅 2026-03-17
Large language model compression has made substantial progress through pruning, quantization, and low-rank decomposition, yet a fundamental limitation persists across all existing methods: compression budgets are allocated without any representation of what individual model components functionally encode. We term this the capability-blind compression problem and argue it is a root cause of two...
47
DISCOVER: A Solver for Distributional Counterfactual Explanations
Yikai Gu, Lele Cao, Bo Zhao et al. (5 authors)
📅 2026-03-17
Counterfactual explanations (CE) explain model decisions by identifying input modifications that lead to different predictions. Most existing methods operate at the instance level. Distributional Counterfactual Explanations (DCE) extend this setting by optimizing an optimal transport objective that balances proximity to a factual input distribution and alignment to a target output distribution,...
48
IRIS: A Real-World Benchmark for Inverse Recovery and Identification of Physical Dynamic Systems from Monocular Video
Rasul Khanbayov, Mohamed Rayan Barhdadi, Erchin Serpedin et al. (4 authors)
📅 2026-03-17
Unsupervised physical parameter estimation from video lacks a common benchmark: existing methods evaluate on non-overlapping synthetic data, the sole real-world dataset is restricted to single-body systems, and no established protocol addresses governing-equation identification. This work introduces IRIS, a high-fidelity benchmark comprising 220 real-world videos captured at 4K resolution and...
49
Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods
Hong Jeong
📅 2026-03-17
Frozen encoder--decoder language models are stateless: the latent representation is discarded after every forward pass, so no information persists across sessions. This paper presents a \textbf{proof-of-concept pilot study} showing that persistent memory in the \emph{continuous latent space} of a frozen LLM is feasible -- even under severe resource constraints (a single frozen Flan-T5-XL...
50
Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement
Yusuke Nishii, Hiroaki Kawashima
📅 2026-03-17
This study investigates a method to guide and control fish schools using virtual fish trained with reinforcement learning. We utilize 2D virtual fish displayed on a screen to overcome technical challenges such as durability and movement constraints inherent in physical robotic agents. To address the lack of detailed behavioral models for real fish, we adopt a model-free reinforcement learning...