← Back to Homepage

Generative Recommendation

生成式推荐系统研究

📊 50 Papers 📅 Updated: 2026-05-14
1
Benchmarking the Open Science Data Federation services to develop XRootD best practices
Fabio Andrijauskas, Igor Sfiligoi, Frank Würthwein
📅 2026-05-13
Research has become dependent on processing power and storage, one crucial aspect being data sharing. The Open Science Data Federation (OSDF) project aims to create a scientific global data distribution network based on the Pelican Platform. OSDF relies on the XRootD and Pelican projects. Nevertheless, OSDF must understand the XRootD limits under various configuration options, including transfer...
2
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation
Chengzhi Shen, Weixiang Shen, Tobias Susetzky et al. (11 authors)
📅 2026-05-13
Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for reliable AI decision support. Existing ICU benchmarks typically treat historical clinician actions as ground truth. However, these actions are made under incomplete information and limited temporal...
3
Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models
Xinye Wanyan, Chenglong Ma, Danula Hettiachchi et al. (5 authors)
📅 2026-05-13
Large Language Model (LLM)-based agent simulation has emerged as a promising approach to meet the increasing demand for real-time and rigorous evaluation in modern recommender systems. A typical LLM-driven simulation framework comprises three essential components: the profile module, memory module, and action module. However, existing studies have primarily concentrated on enhancing the memory...
4
A Standardized Re-evaluation of Conversational Recommender Systems on the ReDial Dataset
Ivica Kostric, Krisztian Balog
📅 2026-05-13
Recent years have seen a surge of research into conversational recommender systems (CRS). Among existing datasets, ReDial is the most widely used benchmark, cited in hundreds of studies. However, variations in how the dataset is preprocessed and used in experiments, particularly in the definition of ground-truth items, make it difficult to compare results across studies. These comparisons are...
5
EcoGEO: Trajectory-Aware Evidence Ecosystems for Web-Enabled LLM Search Agents
Hengwei Ye, Jiasheng Mao, Zhenhan Guan et al. (4 authors)
📅 2026-05-13
Web-enabled LLM agents are changing how online information influences search outcomes. \ Existing Generative Engine Optimization (GEO) studies mainly focus on individual webpages. \ However, agentic web search is not a single-document setting: an agent may issue queries, crawl pages, follow links, reformulate searches, and synthesize evidence across multiple browsing steps. \ Influence therefore...
6
Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety
Muhammad Bilal, Jon Crowcroft, Ruizhi Wang et al. (5 authors)
📅 2026-05-12
Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited self-healing. In both NetOps and AIOps, this shift is changing how tasks are managed. Agent-based operations work as workflows, from gathering evidence to taking...
7
Revealing Interpretable Failure Modes of VLMs
Isha Chaudhary, Vedaant V Jain, Kavya Sachdeva et al. (5 authors)
📅 2026-05-12
Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to generalize with minimal task-specific engineering. Despite these advantages, they can exhibit catastrophic failures in specific real-world situations, constituting failure modes. We introduce REVELIO, a framework for systematically uncovering...
8
MLPs are Efficient Distilled Generative Recommenders
Zitian Guo, Yupeng Hou, Clark Mingxuan Ju et al. (5 authors)
📅 2026-05-12
Generative recommendation models employing Semantic IDs (SIDs) exhibit strong potential, yet their practical deployment is bottlenecked by the high inference latency of beam-expanded autoregressive decoding. In this work, we identify that standard attention-heavy Transformer decoders represent a structural overkill for this task: the hierarchical nature of SIDs makes prediction difficulty drops...
9
NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities
Jina Kim, Gengchen Mai, Lingyi Zhao et al. (5 authors)
📅 2026-05-12
Geospatial foundation models have primarily focused on raster data such as satellite imagery, where self-supervised learning has been widely studied. Vector geospatial data instead represent the world as discrete geoentities with explicit geometry, semantics, and structured spatial relations, including metric proximity and topological relationships. These relations jointly determine how entities...
10
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems
Jiazhou Liang, Armin Toroghi, Yifan Simon Liu et al. (6 authors)
📅 2026-05-12
LLM-based conversational AI agents struggle to maintain coherent behavior over long horizons due to limited context. While RAG-based approaches are increasingly adopted to overcome this limitation by storing interactions in external memory modules and performing retrieval from them, their effectiveness in answering challenging questions (e.g., multi-hop, commonsense) ultimately depends on the...
11
BoolXLLM: LLM-Assisted Explainability for Boolean Models
Du Cheng, Serdar Kadioglu, Xin Wang
📅 2026-05-12
Interpretable machine learning aims to provide transparent models whose decision-making processes can be readily understood by humans. Recent advances in rule-based approaches, such as expressive Boolean formulas (BoolXAI), offer faithful and compact representations of model behavior. However, for non-technical stakeholders, main challenges remain in practice: (i) selecting semantically...
12
To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands
Fangyi Yu, Nabeel Seedat, Jonathan Richard Schwarz et al. (4 authors)
📅 2026-05-12
Language models deployed in high-stakes professional settings face conflicting demands from users, institutional authorities, and professional norms. How models act when these demands conflict reveals a principal hierarchy -- an implicit ordering over competing stakeholders that determines, for instance, whether a medical AI receiving a cost-reduction directive from a hospital administrator...
13
Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems
Josh Rosen, Seth Rosen
📅 2026-05-12
Many AI systems are organized around loops in which models reason, call tools, observe results, and continue until a task is complete. These systems often produce final artifacts such as memos, plans, recommendations, and analyses, while the intermediate work that shaped those outputs remains ephemeral. For multi-step, revisable AI work, final artifacts are often lossy projections over upstream...
14
RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems
Wenwen Zeng, Jinhui Zhang, Hao Chen et al. (13 authors)
📅 2026-05-12
The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tasks. However, current methodologies remain limited by a reliance on single dimensional outcome-based...
15
Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation
Qiuyu Ding, Heng-Da Xu, Wei Zhang et al. (7 authors)
📅 2026-05-12
Generative point-of-interest (POI) recommendation models based on large language models (LLMs) have shown promising results by formulating next POI prediction as a sequence generation task. However, the knowledge encoded in these models remains fixed after training, making them unable to perceive evolving real-world conditions that shape user mobility decisions, such as local events and cultural...
16
AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
Jiarui Jin, Zexuan Yan, Shijian Wang et al. (5 authors)
📅 2026-05-12
In this paper, we present AgentDisCo, a novel Disentangled and Collaborative agentic architecture that formulates deep research as an adversarial optimization problem between information exploration and exploitation. Unlike existing approaches that conflate these two processes into a single module, AgentDisCo employs a critic agent to evaluate generated outlines and refine search queries, and a...
17
Quality-Aware Collaborative Multi-Positive Contrastive Learning for Sequential Recommendation
Wei Wang
📅 2026-05-12
The effectiveness of contrastive learning in sequential recommendation hinges on the construction of contrastive views, which ideally should be both semantically consistent and diverse. However, most existing CL-based methods rely on heuristic augmentations that are prone to removing crucial items or disrupting transition patterns, leading to semantic drift. While a few studies have explored...
18
Persistent and Conversational Multi-Method Explainability for Trustworthy Financial AI
Georgios Makridis, Georgios Fatouros, John Soldatos et al. (5 authors)
📅 2026-05-12
Financial institutions increasingly require AI explanations that are persistent, cross-validated across methods, and conversationally accessible to human decision-makers. We present an architecture for human-centered explainable AI in financial sentiment analysis that combines three contributions. First, we treat XAI artifacts -- LIME feature attributions, occlusion-based word importance scores,...
19
HSUGA: LLM-Enhanced Recommendation with Hierarchical Semantic Understanding and Group-Aware Alignment
Guorui Li, Dugang Liu, Lei Li et al. (5 authors)
📅 2026-05-12
Large language model (LLM)-enhanced sequential recommendation typically aims to improve two core components: user semantic embedding extraction and utilization. Despite promising results, existing methods still have two limitations: 1) In the extraction stage, most methods directly input long interaction sequence fragments into LLM for preference summarization. However, excessively long sequences...
20
TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning
Shiteng Cao, Kaian Jiang, Yunlong Gong et al. (4 authors)
📅 2026-05-12
Generative recommendation with Semantic IDs (SIDs) has emerged as a promising paradigm, yet existing methods apply a fixed inference strategy, either fast direct generation or slow chain-of-thought reasoning, uniformly across all user histories. This approach creates a trade-off: fast recommendation model produces suboptimal accuracy on hard samples, while always invoking slow reasoning incurs...
21
Conditional Memory Enhanced Item Representation for Generative Recommendation
Ziwei Liu, Yejing Wang, Shengyu Zhou et al. (5 authors)
📅 2026-05-12
Generative recommendation (GR) has emerged as a promising paradigm that predicts target items by autoregressively generating their semantic identifiers (SID). Most GR methods follow a quantization-representation-generation pipeline, first assigning each item a SID, then constructing input representations from SID-token embeddings, and finally predicting the target SID through autoregressive...
22
FedMM: Federated Collaborative Signal Quantization for Multi-Market CTR Prediction
Jun Zhang, Dugang Liu, Xing Tang et al. (5 authors)
📅 2026-05-12
Online platforms such as Amazon and Netflix serve users across multiple countries and regions, underscoring the importance of multi-market recommendation (MMR). Most MMR methods adopt a pre-training and fine-tuning paradigm, in which a unified model is first trained on centralized, global data and subsequently adapted to specific markets. However, this approach ignores the privacy of market data....
23
Causal Algorithmic Recourse: Foundations and Methods
Drago Plecko, Collin Wang, Elias Bareinboim
📅 2026-05-12
The trustworthiness of AI decision-making systems is increasingly important. A key feature of such systems is the ability to provide recommendations for how an individual may reverse a negative decision, a problem known as algorithmic recourse. Existing approaches treat recourse outcomes as counterfactuals of a fixed unit, ignoring that real-world recourse involves repeated decisions on the same...
24
Much of Geospatial Web Search Is Beyond Traditional GIS
Ilya Ilyankou, Stefano Cavazzi, James Haworth
📅 2026-05-11
Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We apply dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior...
25
Debiasing Message Passing to Mitigate Popularity Bias in GNN-based Collaborative Filtering
Md Aminul Islam, Ahmed Sayeed Faruk, Sourav Medya et al. (4 authors)
📅 2026-05-11
Collaborative filtering (CF) models based on graph neural networks (GNNs) achieve strong performance in recommender systems by propagating user-item signals over interaction graphs. However, they are highly susceptible to popularity bias, since skewed interaction distributions and repeated message passing across high-order neighborhoods amplify the influence of popular items while suppressing...
26
A Cascaded Generative Approach for e-Commerce Recommendations
Moein Hasani, Hamidreza Shahidi, Trace Levinson et al. (7 authors)
📅 2026-05-11
Personalized storefronts in large e-commerce marketplaces are often assembled from many independent components: static themes per page section ("placement"), retrieval systems to fetch eligible products per placement, and pointwise rankers to order content. While effective in optimizing for aggregate preferences, this paradigm is rigid and can limit personalization and semantic cohesion...
27
ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder
Shubhankit Singh, Hassan Shaikh, Kuldeep Raghuwanshi et al. (4 authors)
📅 2026-05-11
Automated ASD screening tools remain limited by single-architecture evaluations, axis-restricted assessment, and near-exclusive focus on adult cohorts, obscuring age-specific diagnostic patterns critical for early intervention. We introduce ASD-Bench, a systematic tabular benchmark evaluating ML, deep learning, and foundation model configurations across three age cohorts (children 1-11 yr,...
28
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
Liang Luo, Yinbin Ma, Quanyu Zhu et al. (23 authors)
📅 2026-05-11
Recent GPU generations deliver significantly higher FLOPs using lower-precision arithmetic, such as FP8. While successfully applied to large language models (LLMs), its adoption in large recommendation models (LRMs) has been limited. This is because LRMs are numerically sensitive, dominated by small matrix multiplications (GEMMs) followed by normalization, and trained in communication-intensive...
29
The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents
Xinrun Wang, Chang Yang, He Zhao et al. (5 authors)
📅 2026-05-11
LLM-based foundation agents that perceive, reason, and act across thousands of reasoning steps are rapidly becoming the dominant paradigm for deploying artificial intelligence in open-ended, long-horizon complex tasks. Despite this significance, the field remains overwhelmingly engineering-driven. Engineering practice has converged on useful primitives (tool loops, memory banks, harnesses,...
30
AgentGR: Semantic-aware Agentic Group Decision-Making Simulator for Group Recommendation
Yangtao Zhou, Wenhao You, Hua Chu et al. (7 authors)
📅 2026-05-11
Group Recommendation (GR) aims to suggest items to a group of users, which has become a critical component of modern social platforms. Existing GR methods focus on aggregating individual user preferences with advanced neural networks to infer group preferences. Despite effectiveness, they essentially treat group preference learning as a simple preference aggregation process, failing to capture...
31
Every Preference Has Its Strength: Injecting Ordinal Semantics into LLM-Based Recommenders
Jiwon Jeong, Donghee Han, Sungrae Hong et al. (5 authors)
📅 2026-05-11
Recent work has shown that large language models (LLMs) can enhance recommender systems by integrating collaborative filtering (CF) signals through hybrid prompting. However, most existing CF-LLM frameworks collapse explicit ratings into implicit or positive-only feedback, discarding the ordinal structure that conveys fine-grained preference strength. As a result, these models struggle to exploit...
32
IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs
Songlin Bai, Xintong Wang, Linlin Yu et al. (15 authors)
📅 2026-05-11
In industrial procurement, an LLM answer is useful only if it survives a standards check: recommended material must match operating condition, every parameter must respect a regulated threshold, and no procedure may contradict a safety clause. Partial correctness can mask safety-critical contradictions that aggregate LLM benchmarks rarely capture. We introduce IndustryBench, a 2,049-item...
33
LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation
Yiwen Chen, Fuwei Zhang, Zehao Chen et al. (11 authors)
📅 2026-05-11
Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender systems. Latent reasoning has emerged as an effective paradigm in LLMs, performing multi-step inference in a continuous hidden-state space to achieve stronger...
34
CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs
Yuchen Huang, Baiteng Ma, Yiping Sun et al. (9 authors)
📅 2026-05-11
Vector approximate nearest neighbor search (ANNS) underpins search engines, recommendation systems, and advertising services. Recent advances in ANNS indexes make CPU a cost-effective choice for serving million-scale, in-memory vector search, yet per-core throughput remains constrained by memory access latency of vector reading and the compute intensity of distance evaluations in production...
35
Loom: Hybrid Retrieval-Scoring Outfit Recommendation with Semantic Material Compatibility and Occasion-Aware Embedding Priors
Anushree Berlia
📅 2026-05-11
We present Loom, an outfit recommendation system that combines neural embedding retrieval with structured domain scoring to generate complete, coherent outfits from fashion catalogs. Given an anchor clothing item, Loom retrieves complementary pieces via slot-constrained approximate nearest neighbor search over FashionCLIP embeddings, then scores candidate outfits using a multi-objective function...
36
Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
Anushree Berlia
📅 2026-05-11
We present Fashion Florence, a Florence-2 vision-language model fine-tuned with LoRA to extract structured fashion attributes from clothing images. Given a single photograph, the model generates a JSON object containing category, color, material, style tags, and occasion tags, structured output suitable for direct programmatic consumption by downstream recommendation and retrieval systems....
37
Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics
Chao Zhou
📅 2026-05-10
Behavioral curve modeling -- fitting parametric functions to engagement-versus-exposure data -- is standard practice in recommendation, advertising, and clinical dosing. We show that aggregation introduces a systematic distortion: Simpson's paradox in behavioral curves. On Goodreads (3.3M users, 9 genres), individual users peak at n* approximately 11 exposures while the aggregate peaks at n*...
38
EpiGraph: Building Generalists for Evidence-Intensive Epilepsy Reasoning in the Wild
Yuyang Dai, Zheng Chen, Jathurshan Pradeepkumar et al. (7 authors)
📅 2026-05-10
Epilepsy diagnosis and treatment require evidence-intensive reasoning across heterogeneous clinical knowledge, including biosignal patterns, genetic mechanisms, pharmacogenomics, treatment strategies, and patient outcomes. In this work, we present \textsc{EpiGraph}, a large-scale epilepsy knowledge graph and benchmark for evaluating knowledge-augmented clinical reasoning. \textsc{EpiGraph}...
39
SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making
Zeyu Li, Lei Li
📅 2026-05-10
Decision making in large-scale complaint handling systems increasingly relies on heterogeneous evidence, including complaint narratives, screenshots, order metadata, historical interactions, and platform policies. Existing complaint understanding systems mainly perform shallow classification or template matching over isolated modalities, while underutilizing explicit scene structure, rule...
40
A General Framework for Multimodal LLM-Based Multimedia Understanding in Large-Scale Recommendation Systems
Yiming Zhu, Xu Liu, Ziyun Xu et al. (12 authors)
📅 2026-05-10
Conventional recommendation systems frequently fail to fully exploit the high-dimensional semantic signals inherent in multimedia content, thereby limiting the fidelity of user preference modeling. While Multimodal Large Language Models (MM-LLMs) offer robust mechanisms for interpreting such complex data, their integration into latency-constrained, industrial-scale architectures remains a...
41
OpenIIR: An Open Simulation Platform for Information Retrieval Research
Saber Zerhoudi
📅 2026-05-10
OpenIIR runs hundreds of LLM-driven personas as parameterised, reproducible IR research experiments. Researchers configure agents across four kinds of multi-agent study (deliberative panels, social platforms, curated recommender feeds, and evolutionary co-evolution between content producers and credibility detectors) under many priors, rounds, and constraints. Persona budgets, retrieval policies,...
42
Towards Conversational Medical AI with Eyes, Ears and a Voice
Meet Shah, Jason Gusdorf, Anil Palepu et al. (53 authors)
📅 2026-05-10
The practice of medicine relies not only upon skillful dialogue but also on the nuanced exchange and interpretation of rich auditory and visual cues between doctors and patients. Building on the low-latency voice and video processing capabilities of Gemini, we introduce AI co-clinician, a first-of-its-kind conversational AI system utilizing continuous streams of audio-visual data from live...
43
Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation
Haven Kim, Julian McAuley
📅 2026-05-09
Conversational music recommendation (CMR) research currently faces a tradeoff between authentic dialogue corpora that are limited in scale and synthesized corpora that scale up but whose conversations are artificially constructed rather than naturally observed. In this paper, we introduce Reddit2Deezer, a reality-grounded CMR resource derived from 190k unique {thread, leaf-comment} pairs. We...
44
Agentic AI Scientists Are Not Built For Autonomous Scientific Discovery
Harshit Bisht, Vinay Kumar, Kevin Maik Jablonka et al. (5 authors)
📅 2026-05-09
A growing body of work pursues AI scientists capable of end-to-end autonomous scientific discovery. This position paper argues that although they already function as co-scientists, agentic AI scientists are not built for autonomous scientific discovery. We identify the following challenges in building and deploying autonomous AI scientists: (1) Problem selection is influenced by the McNamara...
45
Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation
Yang Xiao, Huiyuan Chen, Kaiyuan Deng et al. (8 authors)
📅 2026-05-09
We propose Compressed Video Aggregator (CVA), a lightweight micro-video recommendation module that decouples video information from preference learning. It aggregates frozen VFM embeddings, and uses latent reasoning without cross-attention projection, producing compact video embeddings for recommenders. Due to the redundancy in the frame count of the original benchmark and its overly coarse...
46
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules
Devin Yasith De Silva, Dhaval Patel, Christodoulos Constantinides et al. (10 authors)
📅 2026-05-09
Monitoring complex industrial assets relies on engineer-authored symbolic rules that trigger based on sensor conditions and prompt technicians to perform corrective actions. The bottleneck is not detection but response: translating rules into maintenance steps requires asset-specific knowledge gained through years of practice. We investigate whether LLMs can serve as decision support for this...
47
Log analysis is necessary for credible evaluation of AI agents
Peter Kirgis, Sayash Kapoor, Stephan Rabanser et al. (11 authors)
📅 2026-05-08
Agent benchmarks typically report only final outcomes: pass or fail. This threatens evaluation credibility in three ways. First, scores may be inflated or deflated by shortcuts and benchmark artifacts, misrepresenting capability. Second, benchmark performance may fail to predict real-world utility due to scaffold limitations and recurring failure modes. Finally, capability scores may conceal...
48
Multi-Level Graph Attention Network Contrastive Learning for Knowledge-Aware Recommendation
Zhifei Hu, Feng Xia
📅 2026-05-08
In recent years, the use of edge information provided by knowledge graphs together with the advantages of higher-order connectivity in graph neural networks for recommendation systems has become an important research direction. However, existing approaches are often limited by sparse labels, insufficient graph structure learning, and noisy entities in the knowledge graph, which reduce...
49
AI-Care: A Conversational Agentic System for Task Coordination in Alzheimer's Disease Care
Preyash Yadav, Michelle Cohn, Priyanka Koppolu et al. (8 authors)
📅 2026-05-08
Individuals with Alzheimer's disease (AD) and Alzheimer's disease-related dementia (ADRD) experience memory and thinking changes that impact their ability to use digital daily management tools. For example, adding an event to a digital calendar requires multiple steps that may act as barriers to independent use for individuals with AD/ADRD. This paper presents AI-Care, a conversational...
50
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
Dongdong Wang, Deepak Balakrishnan, Ravi Srinivasan et al. (4 authors)
📅 2026-05-08
This work investigates the use of large language models (LLMs) for tasks in smart cities. The core idea is to leverage remote sensing imagery to characterize the built environment, including design suggestions, constructability assessment, landuse patterns, and risk identification. We examine remote sensing imagery at multiple spatial scales as inputs for multimodal language modeling and evaluate...