AAAI2026 论文笔记 TODO¶

总计: 1570 篇 | 已完成: 1570 | 待更新: 0

10 Open Challenges Steering the Future of Vision-Language-Action Models | arXiv: 2511.05936v1
3D-ANC: Adaptive Neural Collapse for Robust 3D Point Cloud Recognition | arXiv: 2511.07040
3d-free meets 3d priors novel view synthesis from a single image with pretrained | arXiv: 2408.06157
3d4d an interactive editable 4d world model via 3d video generation | arXiv: 2511.08536
3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation | arXiv: 2512.11557
4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation | arXiv: 2511.07241
A Closer Look at Knowledge Distillation in Spiking Neural Network Training | arXiv: 2511.06902
A Coherence-Based Measure of AGI | arXiv: 2510.20784
A Computable Game-Theoretic Framework for Multi-Agent Theory of Mind | arXiv: 2511.22536
A Content-Preserving Secure Linguistic Steganography | arXiv: 2511.12565
A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs | arXiv: 2505.23816
a data-driven model predictive control framework for multi-aircraft tma routing | arXiv: 2511.19452
A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation | arXiv: 2511.12259
A Distributed Asynchronous Generalized Momentum Algorithm Without Delay Bounds | arXiv: 2508.08218
A Fast Heuristic Search Approach for Energy-Optimal Profile Routing for Electric Vehicles | arXiv: 2512.01331
A Graph-Theoretical Perspective on Law Design for Multiagent Systems | arXiv: 2511.06361
A Learning Framework For Cooperative Collision Avoidance of UAV Swarms Leveraging Domain Knowledge | arXiv: 2507.10913
a mind cannot be smeared across time | arXiv: 2601.11620
A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses | arXiv: 2501.01849
A Multi-Agent LLM Framework for Multi-Domain Low-Resource In-Context NER via Knowledge Retrieval, Disambiguation and Reflective Analysis | arXiv: 2511.19083
a new strategy for verifying reach-avoid specifications in neural feedback syste | arXiv: 2601.08065
A Phase Transition for Opinion Dynamics with Competing Biases | arXiv: 2511.09434
A Principle-Driven Adaptive Policy for Group Cognitive Stimulation Dialogue for Elderly with Cognitive Impairment | arXiv: 2603.10034
A Reasoning Paradigm for Named Entity Recognition | arXiv: 2511.11978
a superpersuasive autonomous policy debating system | arXiv: 2511.17854
A Switching Framework for Online Interval Scheduling with Predictions | arXiv: 2511.16194
a theoretical analysis of detecting large model-generated time series | arXiv: 2511.07104
A Topological Rewriting of Tarski's Mereogeometry | arXiv: 2511.12727
A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication | arXiv: 2511.11560
A Unified Shape-Aware Foundation Model for Time Series Classification | arXiv: 2601.06429v1
A2Flow: Automating Agentic Workflow Generation via Self-Adaptive Abstraction Operators | arXiv: 2511.20693
AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs | arXiv: 2601.02771v1
ActiShade: Activating Overshadowed Knowledge to Guide Multi-Hop Reasoning in Large Language Models | arXiv: 2601.07260
Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward | arXiv: 2508.11143
AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization | arXiv: 2603.11873v1
Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models | arXiv: 2511.15311v2
Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval | arXiv: 2512.00953v1
Adaptive Fidelity Estimation for Quantum Programs with Graph-Guided Noise Awareness | arXiv: 2601.14713v1
adaptive initial residual connections for gnns with theoretical guarantees | arXiv: 2511.06598
Adaptive Morph-Patch Transformer for Aortic Vessel Segmentation | arXiv: 2511.06897
Adaptive Riemannian Graph Neural Networks | arXiv: 2508.02600
Adaptive Theory of Mind for LLM-based Multi-Agent Coordination | arXiv: 2603.16264
Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards | arXiv: 2506.14375v2
AEDR: Training-Free AI-Generated Image Attribution via Autoencoder Double-Reconstruction | arXiv: 2507.18988v2
AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios | arXiv: 2511.21053v2
Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation | arXiv: 2511.06240
Agent-SAMA: State-Aware Mobile Assistant | arXiv: 2505.23596v3
AgentODRL: A Large Language Model-based Multi-agent System for ODRL Generation | arXiv: 2512.00602v1
AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments | arXiv: 2506.11773v4
agentswift efficient llm agent design via value-guided hierarchical search | arXiv: 2506.06017
Aggregating Diverse Cue Experts for AI-Generated Image Detection | arXiv: 2601.08790v1
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions | arXiv: 2509.01787v3
AHAN: Asymmetric Hierarchical Attention Network for Identical Twin Face Verification | arXiv: 2602.21503
ai-based traffic modeling for network security and privacy challenges ahead | arXiv: 2503.22161
airdde multifactor neural delay differential equations for air quality forecasti | arXiv: 2603.17529
Align to Structure: Aligning Large Language Models with Structural Information | arXiv: 2504.03622v2
Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration | arXiv: 2602.20104v1
aligning generative music ai with human preferences methods and challenges | arXiv: 2511.15038
Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping | arXiv: 2511.11551v3
Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment | arXiv: 2603.05566v1
AlignTree: Efficient Defense Against LLM Jailbreak Attacks | arXiv: 2511.12217v1
Align³GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation | arXiv: 2511.11255v2
ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs | arXiv: 2603.01792v1
Alternative Fairness and Accuracy Optimization in Criminal Justice | arXiv: 2511.04505v4
AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment | arXiv: 2511.09385v2
Ambiguity-aware Truncated Flow Matching for Ambiguous Medical Image Segmentation | arXiv: 2511.06857v2
AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design | arXiv: 2512.21613v1
An Epistemic Perspective on Agent Awareness | arXiv: 2511.05977v1
An Improved Privacy and Utility Analysis of Differentially Private SGD with Bounded Domain and Smooth Losses | arXiv: 2502.17772v4
An Information Theoretic Evaluation Metric for Strong Unlearning | arXiv: 2405.17878v3
An Invariant Latent Space Perspective on Language Model Inversion | arXiv: 2511.19569v1
An LLM-Based Simulation Framework for Embodied Conversational Agents in Psychological Counseling | arXiv: 2410.22041v3
an overall real-time mechanism for classification and quality evaluation of rice | arXiv: 2502.13764
AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation | arXiv: 2511.11692v1
AnchorHOI: Zero-shot Generation of 4D Human-Object Interaction via Anchor-based Prior Distillation | arXiv: 2512.14095v1
Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks | arXiv: 2511.12985v2
Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation | arXiv: 2601.09212v1
AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer | arXiv: 2511.06687v1
Answering the Unanswerable Is to Err Knowingly: Analyzing and Mitigating Abstention Failures in Large Reasoning Models | arXiv: 2508.18760v3
Anti-adversarial Learning: Desensitizing Prompts for Large Language Models | arXiv: 2505.01273v2
anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding | arXiv: 2506.00942v2
Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models | arXiv: 2511.14559v1
Approximation Algorithm for Constrained k-Center Clustering: A Local Search Approach | arXiv: 2601.11883
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval | arXiv: 2506.04953v3
AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture | arXiv: 2511.15870v1
Arbitrary-Scale 3D Gaussian Super-Resolution | arXiv: 2508.16467v2
ARCANE: A Multi-Agent Framework for Interpretable and Configurable Alignment | arXiv: 2512.06196
ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction | arXiv: 2511.12485
Are Graph Transformers Necessary? Efficient Long-Range Message Passing with Fractal Nodes in MPNNs | arXiv: 2511.13010
are we done yet a vision-based judge for autonomous task completion of computer | arXiv: 2511.20067
Area-Optimal Control Strategies for Heterogeneous Multi-Agent Pursuit | arXiv: 2511.15036v2
Argumentative Debates for Transparent Bias Detection (ABIDE) | arXiv: 2508.04511v2
as eastern powers i will veto an investigation of nation-level bias of large lan | arXiv: 2511.10695
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation | arXiv: 2507.18224
assessing llms for serendipity discovery in knowledge graphs a case for drug rep | arXiv: 2511.12472
assist-3d adapted scene synthesis for class-agnostic 3d instance segmentation | arXiv: 2512.09364
AStar: Boosting Multimodal Reasoning with Automated Structured Thinking | arXiv: 2502.02339
asymmetric cross-modal knowledge distillation bridging modalities with weak sema | arXiv: 2511.08901
Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning | arXiv: 2512.14709
attention gathers mlps compose a causal analysis of an action-outcome circuit in | arXiv: 2603.11142
attention retention for continual learning with vision transformers | arXiv: 2602.05454
authority backdoor a certifiable backdoor mechanism for authoring dnns | arXiv: 2512.10600
auto-pre an automatic and cost-efficient peer-review framework for language gene | arXiv: 2410.12265
automaldesc large-scale script analysis for cyber threat research | arXiv: 2511.13333
automated reproducibility has a problem statement problem | arXiv: 2601.04226
automating complex document workflows via stepwise and rollback-enabled operatio | arXiv: 2512.04445
autonomous concept drift threshold determination | arXiv: 2511.09953
autopp towards automated product poster generation and optimization | arXiv: 2512.21921
AutoTool: Efficient Tool Selection for Large Language Model Agents | arXiv: 2511.14650v1
AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models | arXiv: 2511.11299
axis-aligned document dewarping | arXiv: 2507.15000
A²LC: Active and Automated Label Correction for Semantic Segmentation | arXiv: 2506.11599
backdoor attacks on open vocabulary object detectors via multi-modal prompt tuni | arXiv: 2511.12735
backdoors in conditional diffusion threats to responsible synthetic data pipelin | arXiv: 2507.04726
badthink triggered overthinking attacks on chain-of-thought reasoning in large l | arXiv: 2511.10714
baid a benchmark for bias assessment of ai detectors | arXiv: 2512.11505
balancing multimodal domain generalization via gradient modulation and projectio | arXiv: 2603.14175
BAMAS: Structuring Budget-Aware Multi-Agent Systems | arXiv: 2511.21572
bandit learning in housing markets | arXiv: 2511.12629
bat learning event-based optical flow with bidirectional adaptive temporal corre | arXiv: 2503.03256
BayesAgent: Bayesian Agentic Reasoning Under Uncertainty via Verbalized Probabilistic Graphical Modeling | arXiv: 2406.05516
bayesian meta-analyses could be more a case study in trial of labor after a cesa | arXiv: 2601.10089
bayesian network structural consensus via greedy min-cut analysis | arXiv: 2504.00467
bce3s binary cross-entropy based tripartite synergistic learning for long-tailed | arXiv: 2511.14097
bcwildfire a long-term multi-factor dataset and deep learning benchmark for bore | arXiv: 2511.17597
bd-net has depth-wise convolution ever been applied in binary neural networks | arXiv: 2511.17633
beautiful images toxic words understanding and addressing offensive text in gene | arXiv: 2502.05066
beerna tertiary structure-based rna inverse folding using artificial bee colony | arXiv: 2511.21781
behavior tokens speak louder disentangled explainable recommendation with behavi | arXiv: 2512.15614
behaviour policy optimization provably lower variance return estimates for off-p | arXiv: 2511.10843
benchmarking llms for political science a united nations perspective | arXiv: 2502.14122
Beta Distribution Learning for Reliable Roadway Crash Risk Assessment | arXiv: 2511.04886
Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents | arXiv: 2601.20412
Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection | arXiv: 2511.07301
beyond cosine similarity magnitude-aware clip for no-reference image quality ass | arXiv: 2511.09948
beyond detection exploring evidence-based multi-agent debate for misinformation | arXiv: 2511.07267
beyond fact retrieval episodic memory for rag with generative semantic workspace | arXiv: 2511.07587
beyond fixed depth adaptive graph neural networks for node classification under | arXiv: 2511.06608
beyond hallucinations a composite score for measuring reliability in open-source | arXiv: 2512.24058
beyond monotonicity revisiting factorization principles in multi-agent q-learnin | arXiv: 2511.09792
beyond observations reconstruction error-guided irregularly sampled time series | arXiv: 2511.06854
beyond perplexity let the reader select retrieval summaries via spectrum project | arXiv: 2508.05909
Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning | arXiv: 2511.10037
beyond semantic features pixel-level mapping for generalized ai-generated image | arXiv: 2512.17350
beyond sharpness a flatness decomposition framework for efficient continual lear | arXiv: 2601.07636
beyond superficial forgetting thorough unlearning through knowledge density esti | arXiv: 2511.11667
beyond the lower bound bridging regret minimization and best arm identification | arXiv: 2511.05802
beyond the mean fisher-orthogonal projection for natural gradient descent in lar | arXiv: 2508.13898
beyond world models rethinking understanding in ai models | arXiv: 2511.12239
bi-level contextual bandits for individualized resource allocation under delayed | arXiv: 2511.10572
bias association discovery framework for open-ended llm generations | arXiv: 2508.01412
biasjailbreakanalyzing ethical biases and jailbreak vulnerabilities in large lan | arXiv: 2410.13334
bica effective biomedical dense retrieval with citation-aware hard negatives | arXiv: 2511.08029
bid farewell to seesaw towards accurate long-tail session-based recommendation v | arXiv: 2511.08378
bidirectional channel-selective semantic interaction for semi-supervised medical | arXiv: 2601.05855
Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning | arXiv: 2508.08385
bipartite mode matching for vision training set search from a hierarchical data | arXiv: 2601.09531
biprompt bilateral prompt optimization for visual and textual debiasing in visio | arXiv: 2601.02147
BLM-Guard: Explainable Multimodal Ad Moderation with Chain-of-Thought and Policy-Aligned Rewards | arXiv: 2602.18193
blue teaming function-calling agents | arXiv: 2601.09292
blur-robust detection via feature restoration an end-to-end framework for prior- | arXiv: 2511.14371
BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning | arXiv: 2511.11421
boosting adversarial transferability via ensemble non-attention | arXiv: 2511.08937
Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models | arXiv: 2506.12409
break the tie learning cluster-customized category relationships for categorical | arXiv: 2511.09049
breaking the adversarial robustness-performance trade-off in text classification | arXiv: 2511.07888
breaking the dyadic barrier rethinking fairness in link prediction beyond demogr | arXiv: 2511.06568
breaking the modality barrier generative modeling for accurate molecule retrieva | arXiv: 2511.06259
breaking the stealth-potency trade-off in clean-image backdoors with generative | arXiv: 2511.07210
Bridging Day and Night: Target-Class Hallucination Suppression in Unpaired Image Translation | arXiv: 2602.15383
bridging granularity gaps hierarchical semantic learning for cross-domain few-sh | arXiv: 2511.12200
Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation (BriMPR) | arXiv: 2511.22862
bridging synthetic and real routing problems via llm-guided instance generation | arXiv: 2511.10233
Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content? | arXiv: 2512.21871
bridging the multilingual safety divide efficient culturally-aware alignment for | arXiv: 2602.13867
bridging the skills gap a course model for modern generative ai education | arXiv: 2511.11757
bridging vision and language for robust context-aware surgical point tracking th | arXiv: 2511.12026
bugsweeper function-level detection of smart contract vulnerabilities using grap | arXiv: 2512.09385
c3rl rethinking the combination of channel-independence and channel-mixing from | arXiv: 2507.17454
c3tg conflict-aware composite and collaborative controlled text generation | arXiv: 2511.09292
cad-vae leveraging correlation-aware latents for comprehensive fair disentanglem | arXiv: 2503.07938
CAE: Hierarchical Semantic Alignment for Image Clustering | arXiv: 2512.00904
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis | arXiv: 2508.02322
can editing llms inject harm | arXiv: 2407.20224
can llms truly embody human personality analyzing ai and human behavior alignmen | arXiv: 2602.07414
can protective watermarking safeguard the copyright of 3d gaussian splatting | arXiv: 2511.22262
can you tell the difference contrastive explanations for abox entailments | arXiv: 2511.11281
cash flow underwriting with bank transaction data advancing msme financial inclu | arXiv: 2510.16066
casl curvature-augmented self-supervised learning for 3d anomaly detection | arXiv: 2511.12909
cat-net a cross-attention tone network for cross-subject eeg-emg fusion tone dec | arXiv: 2511.10935
catastrophic forgetting in kolmogorov-arnold networks | arXiv: 2511.12828
catformer causal temporal transformer with dynamic contextual fusion for driving | arXiv: 2507.13425
catformer when continual learning meets spiking transformers with dynamic thresh | arXiv: 2603.15184
Causal Inference Under Threshold Manipulation: Bayesian Mixture Modeling and Heterogeneous Treatment Effects | arXiv: 2509.19814
causal structure learning for dynamical systems with theoretical score analysis | arXiv: 2512.14361
Causal-Tune: Mining Causal Factors from Vision Foundation Models for Domain Generalized Semantic Segmentation | arXiv: 2512.16567
causalclip causally-informed feature disentanglement and filtering for generaliz | arXiv: 2512.13285
causality matters how temporal information emerges in video language models | arXiv: 2508.11576
Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs | arXiv: 2511.09018
causaltrace a neurosymbolic causal analysis agent for smart manufacturing | arXiv: 2510.12033
ccfqa a benchmark for cross-lingual and cross-modal speech and text factuality e | arXiv: 2508.07295
cd-dpe dual-prompt expert network based on convolutional dictionary feature deco | arXiv: 2511.14014
cellstream dynamical optimal transport informed embeddings for reconstructing ce | arXiv: 2511.13786
center-outward q-dominance a sample-computable proxy for strong stochastic domin | arXiv: 2511.12545
certified branch-and-bound maxsat solving extended version | arXiv: 2511.10273
certified but fooled breaking certified defences with ghost certificates | arXiv: 2511.14003
chain-of-thought driven adversarial scenario extrapolation for robust language m | arXiv: 2505.17089
characterizing ai manipulation risks in brazilian youtube climate discourse | arXiv: 2511.06091
charteditor a reinforcement learning framework for robust chart editing | arXiv: 2511.15266
chatsparent an interactive system for detecting and mitigating cognitive fatigue | arXiv: 2601.11526
chdp cooperative hybrid diffusion policies for reinforcement learning in paramet | arXiv: 2601.05675
cheating stereo matching in full-scale physical adversarial attack against binoc | arXiv: 2511.14386
class-partitioned vq-vae and latent flow matching for point cloud scene generati | arXiv: 2601.12391
Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration | arXiv: 2505.16479
clearair a human-visual-perception-inspired all-in-one image restoration | arXiv: 2601.02763
clicare grounding large language models in clinical guidelines for decision supp | arXiv: 2507.22533
clinician-in-the-loop smart home system to detect urinary tract infection flare- | arXiv: 2511.18334
clip-fti fine-grained face template inversion via clip-driven attribute conditio | arXiv: 2512.15433
clippan adapting clip as a supervisor for unsupervised pansharpening | arXiv: 2511.10896
CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation | arXiv: 2503.05255
Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents | arXiv: 2511.10705
co-layout llm-driven co-optimization for interior layout | arXiv: 2511.12474
coach collaborative agents for contextual highlighting -- a multi-agent framewor | arXiv: 2512.01853
coarse-to-fine open-set graph node classification with large language models | arXiv: 2512.16244
cocolit controlnet-conditioned latent image translation for mri to amyloid pet s | arXiv: 2508.01292
coevo continual evolution of symbolic solutions using large language models | arXiv: 2412.18890
cog-rag cognitive-inspired dual-hypergraph with theme alignment retrieval-augmen | arXiv: 2511.13201
coherent multi-agent trajectory forecasting in team sports with causaltraj | arXiv: 2511.18248
collaborative llm numerical reasoning with local data protection | arXiv: 2504.00299
cometnet contextual motif-guided long-term time series forecasting | arXiv: 2511.08049
comlq benchmarking complex logical queries in information retrieval | arXiv: 2511.12004
commonality in few few-shot multimodal anomaly detection via hypergraph-enhanced | arXiv: 2511.05966
comorag a cognitive-inspired memory-organized rag for stateful long narrative re | arXiv: 2508.10419
compensating distribution drifts in class-incremental learning of pre-trained vi | arXiv: 2511.09926
CompTrack: 信息瓶颈引导的低秩动态Token压缩用于点云跟踪 (Oral) | arXiv: 2511.15580v3
Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models | arXiv: 2511.11751
concepts from representations post-hoc concept bottleneck models via sparse deco | arXiv: 2601.12303
condensed data expansion using model inversion for knowledge distillation | arXiv: 2408.13850
Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition | arXiv: 2511.13137
conditional information bottleneck for multimodal fusion overcoming shortcut lea | arXiv: 2508.10644
coninstruct evaluating large language models on conflict detection and resolutio | arXiv: 2511.14342
Connecting the Dots: Training-Free Visual Grounding via Agentic Reasoning | arXiv: 2511.19516
connectivity-guided sparsification of 2-fwl gnns preserving full expressivity wi | arXiv: 2511.12838
consensus-aligned neuron efficient fine-tuning large language models for multi-d | arXiv: 2602.05694
Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments | arXiv: 2505.19361
constrained and robust policy synthesis with satisfiability-modulo-probabilistic | arXiv: 2511.08078
constrained best arm identification with tests for feasibility | arXiv: 2511.09808
constrained particle seeking solving diffusion inverse problems with just forwar | arXiv: 2603.01837
consurv multimodal continual learning for survival analysis | arXiv: 2511.09853
continuous degradation modeling via latent flow matching for real-world super-re | arXiv: 2602.04193
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning | arXiv: 2511.14396
control illusion the failure of instruction hierarchies in large language models | arXiv: 2502.15851
controllable financial market generation with diffusion guided meta agent | arXiv: 2408.12991
conversational learning diagnosis via reasoning multi-turn interactive learning | arXiv: 2603.03236
convex clustering redefined robust learning with the median of means estimator | arXiv: 2511.14784
convmix a mixed-criteria data augmentation framework for conversational dense re | arXiv: 2508.04001
Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution | arXiv: 2511.19430v1
coordar one-reference 6d pose estimation of novel objects via autoregressive coo | arXiv: 2511.12919
coordinated humanoid robot locomotion with symmetry equivariant reinforcement le | arXiv: 2508.01247
copyright infringement detection in text-to-image diffusion models via different | arXiv: 2509.23022
core-fed bridging collaborative and representation fairness via federated embedd | arXiv: 2602.00647
correcting false alarms from unseen adapting graph anomaly detectors at test tim | arXiv: 2511.07023
cost-free neutrality for the river method | arXiv: 2512.14409
cost-minimized label-flipping poisoning attack to llm alignment | arXiv: 2511.09105
counterfactual explainable ai xai method for deep learning-based multivariate ti | arXiv: 2511.13237
countsteer steering attention for object counting in diffusion models | arXiv: 2511.11253
COVR: Collaborative Optimization of VLMs and RL Agent for Visual-Based Control | arXiv: 2601.06122
creating blank canvas against ai-enabled image forgery | arXiv: 2511.22237
crebench human-aligned creativity evaluation from idea to process to product | arXiv: 2511.13626
credal ensemble distillation for uncertainty quantification | arXiv: 2511.13766
crops improving dense retrieval with cross-perspective positive samples in short | arXiv: 2511.15443
cross modal fine-grained alignment via granularity-aware and region-uncertain mo | arXiv: 2511.07710
cross-modal prompting for balanced incomplete multi-modal emotion recognition | arXiv: 2512.11239
Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models | arXiv: 2601.08476
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models | arXiv: 2511.06793
cross-sample augmented test-time adaptation for personalized intraoperative hypo | arXiv: 2512.15762
cross-space synergy a unified framework for multimodal emotion recognition in co | arXiv: 2512.03521
CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution | arXiv: 2511.21717
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models | arXiv: 2511.12263
ctpd cross tokenizer preference distillation | arXiv: 2601.11865
ctrlfuse mask-prompt guided controllable infrared and visible image fusion | arXiv: 2601.08619
D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies | arXiv: 2511.16590
dance density-agnostic and class-aware network for point cloud completion | arXiv: 2511.07978
dapointmamba domain adaptive point mamba for point cloud completion | arXiv: 2511.20278
data complexity of querying description logic knowledge bases under cost-based s | arXiv: 2511.07095
data heterogeneity and forgotten labels in split federated learning | arXiv: 2511.09736
data verification is the future of quantum computing copilots | arXiv: 2602.04072
data whitening improves sparse autoencoder learning | arXiv: 2511.13981
dcmatch unsupervised multi-shape matching with dual-level consistency | arXiv: 2509.01204
deadline-aware energy-efficient control of domestic immersion hot water heater | arXiv: 2601.18123
debiased dual-invariant defense for adversarially robust person re-identificatio | arXiv: 2511.09933
debiasing diffusion priors via 3d attention for consistent gaussian splatting | arXiv: 2512.07345
Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data | arXiv: 2508.01341
decoding with structured awareness integrating directional frequency-spatial and | arXiv: 2512.05494
decomposition and preprocessing of ternary constraint networks | arXiv: 2511.11872
decor deep embedding clustering with orientation robustness | arXiv: 2510.03328
DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF | arXiv: 2511.19097
decoupling scene perception and ego status a multi-context fusion approach for e | arXiv: 2511.13079
Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning | arXiv: 2507.10007
deep incomplete multi-view clustering via hierarchical imputation and alignment | arXiv: 2601.09051
deep predictive discounted counterfactual regret minimization | arXiv: 2511.08174
deepboots dual-stream residual boosting for drift-resilient time-series forecast | arXiv: 2511.06893
deepgb-tb a risk-balanced cross-attention gradient-boosted convolutional network | arXiv: 2508.02741
deepprooflog efficient proving in deep stochastic logic programs | arXiv: 2511.08581
deepraht learning predictive raht for point cloud attribute compression | arXiv: 2601.12255
deeprwcap neural-guided random-walk capacitance solver for ic design | arXiv: 2511.06831
deeptracer tracing stolen model via deep coupled watermarks | arXiv: 2511.08985
deformtrace a deformable state space model with relay tokens for temporal forger | arXiv: 2603.04882
deig detail-enhanced instance generation with fine-grained semantic control | arXiv: 2602.18282
democratizing llm efficiency from hyperscale optimizations to universal deployab | arXiv: 2511.20662
denas-vit data efficient nas-optimized vision transformer for ultrasound image s | arXiv: 2407.04203
DEPO: Dual-Efficiency Preference Optimization for LLM Agents | arXiv: 2511.15392
depth-synergized mamba meets memory experts for all-day image reflection separat | arXiv: 2601.00322
description logics with two types of definite descriptions complexity expressive | arXiv: 2512.06604
designing incident reporting systems for harms from general-purpose ai | arXiv: 2511.05914
designing truthful mechanisms for asymptotic fair division | arXiv: 2512.10892
detect all-type deepfake audio wavelet prompt tuning for enhanced auditory perce | arXiv: 2504.06753
detecting the future all-at-once event sequence forecasting with horizon matchin | arXiv: 2408.13131
detonation decoupled torch network-aware training on interlinked online nodes | arXiv: 2502.06728
deviation dynamics in cardinal hedonic games | arXiv: 2511.11531
dexterous manipulation transfer via progressive kinematic-dynamic alignment | arXiv: 2511.10987
dfdt dynamic fast decision tree for iot data stream mining on edge devices | arXiv: 2502.14011
dia-gnostic vlvae disentangled alignment-constrained vision language variational | arXiv: 2511.05968
dicap distribution-calibrated pseudo-labeling for semi-supervised multi-label le | arXiv: 2511.20225
DICE: Distilling Classifier-Free Guidance into Text Embeddings | arXiv: 2502.03726
diff-v2m a hierarchical conditional diffusion model with explicit rhythmic model | arXiv: 2511.09090
diffa large language diffusion models can listen and understand | arXiv: 2507.18452
DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation | arXiv: 2601.03178
Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models | arXiv: 2511.09973
differentiable semantic meta-learning framework for long-tail motion forecasting | arXiv: 2511.06649
differentiated directional intervention a framework for evading llm safety align | arXiv: 2511.06852
Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data | arXiv: 2411.18109
difficulty-aware label-guided denoising for monocular 3d object detection | arXiv: 2511.13195
diffmm efficient method for accurate noisy and sparse trajectory map matching vi | arXiv: 2601.08482
diffop reinforcement learning of optimization-based control policies via implici | arXiv: 2411.07484
diffrefiner coarse to fine trajectory planning via diffusion refinement with sem | arXiv: 2511.17150
diffusion reconstruction-based data likelihood estimation for core-set selection | arXiv: 2511.19274
discode distribution-aware score decoder for robust automatic evaluation of imag | arXiv: 2512.14420
discounted cuts a stackelberg approach to network disruption | arXiv: 2511.10804
Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers | arXiv: 2511.06848
Distilling Cross-Modal Knowledge via Feature Disentanglement | arXiv: 2511.19887
distilling deep reinforcement learning into interpretable fuzzy rules an explain | arXiv: 2603.13257
distilling future temporal knowledge with masked feature reconstruction for 3d o | arXiv: 2512.08247
distribution-based feature attribution for explaining the predictions of any cla | arXiv: 2511.09332
distributional priors guided diffusion for generating 3d molecules in low data r | arXiv: 2404.00962
distributionally robust online markov game with linear function approximation | arXiv: 2511.07831
diversifying counterattacks orthogonal exploration for robust clip inference | arXiv: 2511.09064
divide conquer and unite hierarchical style-recalibrated prototype alignment for | arXiv: 2511.10945
do it for her first-order temporal logic reward specification in reinforcement l | arXiv: 2602.06227
do large language models think like the brain sentence-level evidences from laye | arXiv: 2505.22563
do llms feel teaching emotion recognition with prompts retrieval and curriculum | arXiv: 2511.07061
do llms really struggle at nl-fol translation revealing their strengths via a no | arXiv: 2511.11816
do not merge my model safeguarding open-source llms against unauthorized model m | arXiv: 2511.10712
do retrieval augmented language models know when they dont know | arXiv: 2509.01476
do we need perfect data leveraging noise for domain generalized segmentation | arXiv: 2511.22948
does less hallucination mean less creativity an empirical investigation in llms | arXiv: 2512.11509
Does Self-Evaluation Enable Wireheading in Language Models? | arXiv: 2511.23092
dogfit domain-guided fine-tuning for efficient transfer learning of diffusion mo | arXiv: 2508.05685
domain generalized stereo matching with uncertainty-guided data augmentation | arXiv: 2508.01303
dont start over a cost-effective framework for migrating personalized prompts be | arXiv: 2601.12034
dos distilling observable softmaps of zipfian prototypes for self-supervised poi | arXiv: 2512.11465
DOS: Directional Object Separation in Text Embeddings for Multi-Object Image Generation | arXiv: 2510.14376
dp-geng differentially private dataset distillation guided by dp-generated data | arXiv: 2511.09876
DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation | arXiv: 2411.16657
drexperts differential refinement of distortion-aware experts for blind image qu | arXiv: 2602.09531
drive as you like strategy-level motion planning based on a multi-head diffusion | arXiv: 2508.16947
driveflow rectified flow adaptation for robust 3d object detection in autonomous | arXiv: 2511.18713
drivesuprim towards precise trajectory selection for end-to-end planning | arXiv: 2506.06659
drmd deep reinforcement learning for malware detection under concept drift | arXiv: 2508.18839
dropouts in confidence moral uncertainty in human-llm alignment | arXiv: 2511.13290
ds-atgo dual-stage synergistic learning via forward adaptive threshold and backw | arXiv: 2511.13050
dual-branch spatial-temporal self-supervised representation for enhanced road ne | arXiv: 2511.06633
dual-path knowledge-augmented contrastive alignment network for spatially resolv | arXiv: 2511.17685
dualfete revisiting teacher-student interactions from a feedback perspective for | arXiv: 2511.09319
dualspeechlm towards unified speech understanding and generation via dual speech | arXiv: 2508.08961
dw-dgat dynamically weighted dual graph attention network for neurodegenerative | arXiv: 2601.10001
dynamic gaussian scene reconstruction from unsynchronized videos | arXiv: 2511.11175
DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression | arXiv: 2511.07903
eagle episodic appearance- and geometry-aware memory for unified 2d-3d visual qu | arXiv: 2511.08007
earth-adapter bridge the geospatial domain gaps with mixture of frequency adapta | arXiv: 2504.06220
ease practical and efficient safety alignment for small language models | arXiv: 2511.06512
easy to learn yet hard to forget towards robust unlearning under bias | arXiv: 2602.21773
echogen cycle-consistent learning for unified layout-image generation and unders | arXiv: 2603.18001
echoless label-based pre-computation for memory-efficient heterogeneous graph le | arXiv: 2511.11081
EcoAgent: An Efficient Device-Cloud Collaborative Multi-Agent Framework for Mobile Automation | arXiv: 2505.05440
ecpv2 fast efficient and scalable global optimization of lipschitz functions | arXiv: 2511.16575
eeg-dlite dataset distillation for efficient large eeg model training | arXiv: 2512.12210
efficient and reliable hitting-set computations for the implicit hitting set app | arXiv: 2508.07015
efficient chromosome parallelization for precision medicine genomic workflows | arXiv: 2511.15977
efficient multiagent planning via shared action suggestions | arXiv: 2412.11430
efficient reasoning for large reasoning language models via certainty-guided ref | arXiv: 2508.05337
efficient thought space exploration through strategic intervention | arXiv: 2511.10038
efficientflow efficient equivariant flow policy learning for embodied ai | arXiv: 2512.02020
efficientfsl enhancing few-shot classification via query-only tuning in vision t | arXiv: 2601.08499
efx and po allocation exists for two types of goods | arXiv: 2601.03438
egoems a high-fidelity multimodal egocentric dataset for cognitive assistance in | arXiv: 2511.09894
elementarynet a non-strategic neural network for predicting human behavior in no | arXiv: 2503.05925
elspr evaluator llm training data self-purification on non-transitive preference | arXiv: 2505.17691
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens | arXiv: 2511.21106
emergent persuasion will llms persuade without being prompted | arXiv: 2512.22201
emovid a multimodal emotion video dataset for emotion-centric video understandin | arXiv: 2511.11002
empowering dino representations for underwater instance segmentation via aligner | arXiv: 2511.08334
empowering semantic-sensitive underwater image enhancement with vlm | arXiv: 2603.12773
end-to-end contrastive language-speech pretraining model for long-form spoken qu | arXiv: 2511.09282
enhancing binary encoded crime linkage analysis using siamese network | arXiv: 2511.07651
enhancing control policy smoothness by aligning actions with predictions from pr | arXiv: 2601.18479
enhancing dpsgd via per-sample momentum and low-pass filtering | arXiv: 2511.08841
enhancing generalization of depth estimation foundation model via weakly-supervi | arXiv: 2511.14238
enhancing logical expressiveness in graph neural networks via path-neighbor aggr | arXiv: 2511.07994
enhancing multimodal misinformation detection by replaying the whole story from | arXiv: 2511.06284
enhancing noise resilience in face clustering via sparse differential transforme | arXiv: 2512.22612
enhancing robustness of offline reinforcement learning under data corruption via | arXiv: 2511.17568
enhancing rotation-invariant 3d learning with global pose awareness and attentio | arXiv: 2511.08833
enhancing uncertainty estimation in llms with expectation of aggregated internal | arXiv: 2509.01564
epo diverse and realistic protein ensemble generation via energy preference opti | arXiv: 2511.10165
epsegfz efficient point cloud semantic segmentation for few- and zero-shot scena | arXiv: 2511.11700
equacode a multi-strategy jailbreak approach for large language models via equat | arXiv: 2512.23173
error correction in radiology reports a knowledge distillation-based multi-stage | arXiv: 2406.15045
esg-bench benchmarking long-context esg reports for hallucination mitigation | arXiv: 2603.13154
evaluating llms for police decision-making a framework based on police action sc | arXiv: 2601.03553
evaluating synthesizing and enhancing for customer support conversation | arXiv: 2508.04423
EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer | arXiv: 2509.12718
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding | arXiv: 2503.09143
expandable and differentiable dual memories with orthogonal regularization for e | arXiv: 2511.09871
experience with single domain generalization in real world medical imaging deplo | arXiv: 2601.16359
expert-guided prompting and retrieval-augmented generation for emergency medical | arXiv: 2511.10900
expertad enhancing autonomous driving systems with mixture of experts | arXiv: 2511.11740
explainable melanoma diagnosis with contrastive learning and llm-based report ge | arXiv: 2512.06105
explaining decentralized multi-agent reinforcement learning policies | arXiv: 2511.10409
explanation-preserving augmentation for semi-supervised graph representation lea | arXiv: 2410.12657
explicit temporal-semantic modeling for dense video captioning via context-aware | arXiv: 2511.10134
explore and establish synergistic effects between weight pruning and coreset sel | arXiv: 2511.09901
Explore How to Inject Beneficial Noise in MLLMs | arXiv: 2511.12917
exploring llms for scientific information extraction using the sciex framework | arXiv: 2512.10004
exploring surround-view fisheye camera 3d object detection | arXiv: 2511.18695
Exploring the Effects of Alignment on Numerical Bias in Large Language Models | arXiv: 2601.16444
exposing deepfakes via hyperspectral domain mapping | arXiv: 2511.11732
exposing the cracks vulnerabilities of retrieval-augmented llm-based machine tra | arXiv: 2510.00829
expressive temporal specifications for reward monitoring | arXiv: 2511.12808
extendattack attacking servers of lrms via extending reasoning | arXiv: 2506.13737
Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction | arXiv: 2511.13118
Extreme Value Monte Carlo Tree Search for Classical Planning | arXiv: 2405.18248
facial-r1 aligning reasoning and recognition for facial emotion analysis | arXiv: 2511.10254
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System | arXiv: 2508.06059
factguard event-centric and commonsense-guided fake news detection | arXiv: 2511.10281
factorut controlling untrusted ai by monitoring their plans | arXiv: 2512.14745
failures to surface harmful contents in video large language models | arXiv: 2508.10974
fair model-based clustering | arXiv: 2602.21509
fairgse fairness-aware graph neural network without high false positive rates | arXiv: 2511.12132
fane towards fine-grained cross-modal contrast with false-negative reduction and | arXiv: 2511.12215
fantasystyle controllable stylized distillation for 3d gaussian splatting | arXiv: 2508.08136
fast 3d surrogate modeling for data center thermal management | arXiv: 2511.11722
FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning | arXiv: 2507.23318
faster certified symmetry breaking using orders with auxiliary variables | arXiv: 2511.16637
fdp a frequency-decomposition preprocessing pipeline for unsupervised anomaly de | arXiv: 2511.12899
feature-centric unsupervised node representation learning without homophily assu | arXiv: 2512.15112
fedalt federated fine-tuning through adaptive local training with rest-of-world | arXiv: 2503.11880
federated clip for resource-efficient heterogeneous medical image classification | arXiv: 2511.07929
fedgrpo privately optimizing foundation models with group-relative rewards from | arXiv: 2602.12014
fedp2eft federated learning to personalize peft for multilingual llms | arXiv: 2502.04387
fedpm federated learning using second-order optimization with preconditioned mix | arXiv: 2511.09100
few-shot precise event spotting via unified multi-entity graph and distillation | arXiv: 2511.14186
fgm-hd boosting generation diversity of fractal generative models through hausdo | arXiv: 2511.08945
fia-edit frequency-interactive attention for efficient and high-fidelity inversi | arXiv: 2511.12151
filmweaver weaving consistent multi-shot videos with cache-guided autoregressive | arXiv: 2512.11274
Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration | arXiv: 2411.17686
finding diverse solutions parameterized by cliquewidth | arXiv: 2405.20931
finding the translation switch discovering and exploiting the task-initiation fe | arXiv: 2601.11019
finding time series anomalies using granular-ball vector data description | arXiv: 2511.12147
fine-grained dino tuning with dual supervision for face forgery detection | arXiv: 2511.12107
fine-grained representation for lane topology reasoning | arXiv: 2511.12590
fine-tuned llms know they dont know a parameter-efficient approach to recovering | arXiv: 2511.12991
finetec fine-grained action recognition under temporal corruption via skeleton d | arXiv: 2512.25067
finevau a novel human-aligned benchmark for fine-grained video anomaly understan | arXiv: 2601.17258
finextrol controllable motion generation via fine-grained text | arXiv: 2511.18927
finmmdocr benchmarking financial multimodal reasoning with scenario awareness do | arXiv: 2512.24903
finrpt dataset evaluation system and llm-based multi-agent framework for equity | arXiv: 2511.07322
first-order error matters accurate compensation for quantized large language mod | arXiv: 2507.11017
first-order representation languages for goal-conditioned rl | arXiv: 2512.19355
flashkat understanding and addressing performance bottlenecks in the kolmogorov- | arXiv: 2505.13813
flexible concept bottleneck model | arXiv: 2511.06678
flowing backwards improving normalizing flows via reverse representation alignme | arXiv: 2511.22345
focusing on language revealing and exploiting language attention heads in multil | arXiv: 2511.07498
forest vs tree the n k trade-off in reproducible ml evaluation | arXiv: 2508.03663
forget less by learning from parents through hierarchical relationships | arXiv: 2601.01892
formal abductive latent explanations for prototype-based networks | arXiv: 2511.16588
formal verification of diffusion auctions | arXiv: 2511.08765
format as a prior quantifying and analyzing bias in llms for heterogeneous data | arXiv: 2508.15793
format matters the robustness of multimodal llms in reviewing evidence from tabl | arXiv: 2511.10075
FoundationSLAM: 释放深度基础模型在端到端稠密视觉SLAM中的潜力 | arXiv: 2512.25008v2
FourierPET: Deep Fourier-based Unrolled Network for Low-count PET Reconstruction | arXiv: 2601.11680
FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection | arXiv: 2502.15488
free-form scene editor enabling multi-round object manipulation like in a 3d eng | arXiv: 2511.13713
freeinpaint tuning-free prompt alignment and visual rationality enhancement in i | arXiv: 2512.21104
freqcycle a multi-scale time-frequency analysis method for time series forecasti | arXiv: 2603.09661
FreqRec: Exploiting Inter-Session Information with Frequency-enhanced Dual-Path Networks for Sequential Recommendation | arXiv: 2511.06285
from attribution to action jointly aligning predictions and explanations | arXiv: 2511.06944
from biased chatbots to biased agents examining role assignment effects on llm a | arXiv: 2602.12285
from classification to ranking enhancing llm reasoning capabilities for mbti per | arXiv: 2601.18582
from decision trees to boolean logic a fast and unified shap algorithm | arXiv: 2511.09376
from ids to semantics a generative framework for cross-domain recommendation wit | arXiv: 2511.08006
from imitation to discrimination toward a generalized curriculum advantage mecha | arXiv: 2512.02580
From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging | arXiv: 2511.10943
from passive perception to active memory a weakly supervised image manipulation | arXiv: 2511.20359
from policy to logic for efficient and interpretable coverage assessment | arXiv: 2601.01266
from pretrain to pain adversarial vulnerability of video foundation models witho | arXiv: 2511.07049
from sequential to recursive enhancing decision-focused learning with bidirectio | arXiv: 2511.08035
from single to societal analyzing persona-induced bias in multi-agent interactio | arXiv: 2511.11789
from theory of mind to theory of environment counterfactual simulation of latent | arXiv: 2601.01599
from woofs to words towards intelligent robotic guide dogs with verbal communica | arXiv: 2603.12574
ft-ncfm an influence-aware data distillation framework for efficient vla models | arXiv: 2511.16233
funkan functional kolmogorov-arnold network for medical image enhancement and se | arXiv: 2509.13508
g-ubs towards robust understanding of implicit feedback via group-aware user beh | arXiv: 2508.05709
g2lfrom giga-scale to cancer-specific large-scale pathology foundation models vi | arXiv: 2510.11176
gaico a deployed and extensible framework for evaluating diverse and multimodal | arXiv: 2508.16753
gaming the answer matcher examining the impact of text manipulation on automated | arXiv: 2601.08849
gatera token-aware modulation for parameter-efficient fine-tuning | arXiv: 2511.17582
gaussian blending rethinking alpha blending in 3d gaussian splatting | arXiv: 2511.15102
gaussianimage boosted image representation and compression with 2d gaussian spla | arXiv: 2512.19108
gazeinterpreter parsing eye gaze to generate eye-body-coordinated narrations | arXiv: 2511.16245
gcl-ot graph contrastive learning with optimal transport for heterophilic text-a | arXiv: 2511.16778
gdba revisited unleashing the power of guided local search for distributed const | arXiv: 2508.06899
gem generative entropy-guided preference modeling for few-shot alignment of llms | arXiv: 2511.13007
gender bias in emotion recognition by large language models | arXiv: 2511.19785
gene incremental learning for single-cell transcriptomics | arXiv: 2511.13762
genepheno interpretable gene knockout-induced phenotype abnormality prediction f | arXiv: 2511.09512
generalising traffic forecasting to regions without traffic observations | arXiv: 2508.08947
generalizable slum detection from satellite imagery with mixture-of-experts | arXiv: 2511.10300
generalization bounds for semi-supervised matrix completion with distributional | arXiv: 2511.13049
generalized geometry encoding volume for real-time stereo matching | arXiv: 2512.06793
generalizing analogical inference from boolean to continuous domains | arXiv: 2511.10416
generalizing fair clustering to multiple groups algorithms and applications | arXiv: 2511.11539
generating attribute-aware human motions from textual prompt | arXiv: 2506.21912
genvidbench a 6-million benchmark for ai-generated video detection | arXiv: 2501.11340
geometry meets light leveraging geometric priors for universal photometric stere | arXiv: 2511.13015
gewdiff geometric enhanced wavelet-based diffusion model for hyperspectral image | arXiv: 2511.07103
ghost in the transformer detecting model reuse with invariant spectral signature | arXiv: 2511.06390
ghost solving the traveling salesman problem on graphs of convex sets | arXiv: 2511.06471
giim graph-based learning of inter- and intra-view dependencies for multi-view m | arXiv: 2603.09446
Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models | arXiv: 2501.05179
global-lens transformers adaptive token mixing for dynamic link prediction | arXiv: 2511.12442
gloctm cross-lingual topic modeling via a global context space | arXiv: 2601.11872
GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery | arXiv: 2602.19872
gompsnr reflourish the signal-to-noise ratio metric for audio generation tasks | arXiv: 2601.13758
good-for-mdp state reduction for stochastic ltl planning | arXiv: 2511.09073
gp-molformer-sim test time molecular optimization through contextual similarity | arXiv: 2506.05628
gram-r2 self-training generative foundation reward models for reward reasoning | arXiv: 2509.02492
granalign granularity-aware alignment framework for zero-shot video moment retri | arXiv: 2601.00584
graph of verification structured verification of llm reasoning with directed acy | arXiv: 2506.12509
graph out-of-distribution detection via test-time calibration with dual dynamic | arXiv: 2511.13541
graph smoothing for enhanced local geometry learning in point cloud analysis | arXiv: 2601.11102
Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting | arXiv: 2603.06663
graph-theoretic consistency for robust and topology-aware semi-supervised histop | arXiv: 2509.22689
graphtextack a realistic black-box node injection attack on llm-enhanced gnns | arXiv: 2511.12423
griffin aerial-ground cooperative detection and tracking dataset and benchmark | arXiv: 2503.06983
grim task-oriented grasping with conditioning on generative examples | arXiv: 2506.15607
ground what you see hallucination-resistant mllms via caption feedback diversity | arXiv: 2601.06224
group orthogonal low-rank adaptation for rgb-t tracking | arXiv: 2512.05359
grover graph-guided representation of omics and vision with expert regulation fo | arXiv: 2511.11730
gsap-ere fine-grained scholarly entity and relation extraction focused on machin | arXiv: 2511.09411
gt-snt a linear-time transformer for large-scale graphs via spiking node tokeniz | arXiv: 2504.11840
gt2-gs geometry-aware texture transfer for gaussian splatting | arXiv: 2505.15208
Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs | arXiv: 2508.02573
guided perturbation sensitivity gps detecting adversarial text via embedding sta | arXiv: 2508.11667
guidegen a text-guided framework for paired full-torso anatomy and ct volume gen | arXiv: 2403.07247
guideline-consistent segmentation via multi-agent refinement | arXiv: 2509.04687
h-gar a hierarchical interaction framework via goal-driven observation-action re | arXiv: 2511.17079
HACK: Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling | arXiv: 2504.09261
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models | arXiv: 2511.17170
hallucination stations on some basic limitations of transformer-based language m | arXiv: 2507.07505
hard vs noise resolving hard-noisy sample confusion in recommender systems via l | arXiv: 2511.07295
harmonic dataset distillation for time series forecasting | arXiv: 2603.03760
harnessing textual semantic priors for knowledge transfer and refinement in clip | arXiv: 2508.01579
Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models | arXiv: 2504.08202
harnessing vision-language models for time series anomaly detection | arXiv: 2506.06836
hashed watermark as a filter defeating forging and overwriting attacks in weight | arXiv: 2507.11137
hcf hierarchical cascade framework for distributed multi-stage image compression | arXiv: 2508.02051
hcpo hierarchical conductor-based policy optimization in multi-agent reinforceme | arXiv: 2511.12123
hd2-ssc high-dimension high-density semantic scene completion for autonomous dri | arXiv: 2511.07925
HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection | arXiv: 2512.17601v2
healsplit towards self-healing through adversarial distillation in split federat | arXiv: 2511.11240
hearing more with less multi-modal retrieval-and-selection augmented conversatio | arXiv: 2508.01166
heterogeneous uncertainty-guided composed image retrieval with fine-grained prob | arXiv: 2601.11393
hierarchical direction perception via atomic dot-product operators for rotation- | arXiv: 2511.08240
hierarchical pedagogical oversight a multi-agent adversarial framework for relia | arXiv: 2512.22496
hierarchical prompt learning for image- and text-based person re-identification | arXiv: 2511.13575
hierarchical schedule optimization for fast and robust diffusion model sampling | arXiv: 2511.11688
hierarchicalprune position-aware compression for large-scale diffusion models | arXiv: 2508.04663
hifusion hierarchical intra-spot alignment and regional context fusion for spati | arXiv: 2511.12969
higher-order responsibility | arXiv: 2506.01003
hilomix robust high- and low-frequency graph learning framework for mixing addre | arXiv: 2511.07759
HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment | arXiv: 2511.06653
History-Aware Reasoning for GUI Agents | arXiv: 2511.09127
hn-mvts hypernetwork-based multivariate time series forecasting | arXiv: 2511.08340
how bias binds measuring hidden associations for bias control in text-to-image c | arXiv: 2511.07091
How Does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective | arXiv: 2505.21505
how hard is it to explain preferences using few boolean attributes | arXiv: 2511.13445
how hard is it to rig a tournament when few players can beat or be beaten by the | arXiv: 2601.08530
how many experts are enough towards optimal semantic specialization for mixture- | arXiv: 2512.19765
how to marginalize in causal structure learning | arXiv: 2511.14001
how wide and how deep mitigating over-squashing of gnns via channel capacity con | arXiv: 2511.06443
hpsu a benchmark for human-level perception in real-world spoken speech understa | arXiv: 2511.23178
hq-svc towards high-quality zero-shot singing voice conversion in low-resource s | arXiv: 2511.08496
hskbenchmark modeling and benchmarking chinese second language acquisition in la | arXiv: 2511.15574
human cognition inspired rag with knowledge graph for complex problem solving | arXiv: 2503.06567
human cognitive biases in explanation-based interaction the case of within and b | arXiv: 2512.04764
human-centric open-future task discovery formulation benchmark and scalable tree | arXiv: 2511.18929
human-in-the-loop interactive report generation for chronic disease adherence | arXiv: 2601.06364
hybrid-dmkg a hybrid reasoning framework over dynamic multimodal knowledge graph | arXiv: 2512.00881
hybridla hybrid generation for document layout analysis | arXiv: 2511.19919
hydrodcm hydrological domain-conditioned modulation for cross-reservoir inflow p | arXiv: 2512.03300
hymoerec hybrid mixture-of-experts for sequential recommendation | arXiv: 2511.06388
hyperbolic continuous structural entropy for hierarchical clustering | arXiv: 2512.00524
hyperbolic hierarchical alignment reasoning network for text-3d retrieval | arXiv: 2511.11045
hypershap shapley values and interactions for explaining hyperparameter optimiza | arXiv: 2502.01276
hypothesis generation via llm-automated language bias for ilp | arXiv: 2505.21486
i-cam-uv integrating causal graphs over non-identical variable sets using causal | arXiv: 2603.03207
i-inr iterative implicit neural representations | arXiv: 2504.17364
i2e real-time image-to-event conversion for high-performance spiking neural netw | arXiv: 2511.08065
icl-router in-context learned model representations for llm routing | arXiv: 2510.09719
ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement | arXiv: 2511.13607
idealtsf can non-ideal data contribute to enhancing the performance of time seri | arXiv: 2512.05442
identifying and analyzing performance-critical tokens in large language models | arXiv: 2401.11323
ie-srgs an internal-external knowledge fusion framework for high-fidelity 3d gau | arXiv: 2511.22233
iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference | arXiv: 2511.11306
imagebinddc compressing multi-modal data with imagebind-based condensation | arXiv: 2511.08263
importance-aware data selection for efficient llm instruction tuning | arXiv: 2511.07074
improved differentially private algorithms for rank aggregation | arXiv: 2511.11319
improved masked image generation with knowledge-augmented token representations | arXiv: 2511.12032
improved runtime guarantees for the spea2 multi-objective optimizer | arXiv: 2511.07150
improving multimodal sentiment analysis via modality optimization and dynamic pr | arXiv: 2511.06328
improving region representation learning from urban imagery with noisy long-capt | arXiv: 2511.07062
improving sparse imu-based motion capture with motion label smoothing | arXiv: 2511.22288
improving sustainability of adversarial examples in class-incremental learning | arXiv: 2511.09088
improving the convergence rate of ray search optimization for query-efficient ha | arXiv: 2512.21241
Improving Value-based Process Verifier via Low-Cost Variance Reduction | arXiv: 2508.10539
in-token rationality optimization towards accurate and concise llm reasoning via | arXiv: 2511.09865
Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement | arXiv: 2511.16331
incremental maintenance of datalogmtl materialisations | arXiv: 2511.12169
induce align predict zero-shot stance detection via cognitive inductive reasonin | arXiv: 2506.13470
inductive generative recommendation via retrieval-based speculation | arXiv: 2410.02939
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration | arXiv: 2512.02981
Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models | arXiv: 2508.10030
infigui-g1 advancing gui grounding with adaptive exploration policy optimization | arXiv: 2508.05731
Infinite-Story: A Training-Free Consistent Text-to-Image Generation | arXiv: 2511.13002
InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer | arXiv: 2511.15967
infocom kilobyte-scale communication-efficient collaborative perception with inf | arXiv: 2512.10305
infodecom decomposing information for defending against privacy leakage in split | arXiv: 2511.13365
information theoretic optimal surveillance for epidemic prevalence in networks | arXiv: 2601.04267
instance generation for meta-black-box optimization through latent space reverse | arXiv: 2509.15810
intention chain-of-thought prompting with dynamic routing for code generation | arXiv: 2512.14048
intention-guided cognitive reasoning for egocentric long-term action anticipatio | arXiv: 2508.01742
intermediate n-gramming deterministic and fast n-grams for large n and large dat | arXiv: 2511.14955
intermoe individual-specific 3d human interaction generation via dynamic tempora | arXiv: 2511.13488
interpretable reward model via sparse autoencoder | arXiv: 2508.08746
interpreting fedspeak with confidence a llm-based uncertainty-aware framework gu | arXiv: 2508.08001
intervention efficiency and perturbation validation framework capacity-aware and | arXiv: 2511.14317
intrinsic barriers and practical pathways for human-ai alignment an agreement-ba | arXiv: 2502.05934
investigating data pruning for pretraining biological foundation models at scale | arXiv: 2512.12932
invisible triggers visible threats road-style adversarial creation attack for vi | arXiv: 2511.08015
irote human-like traits elicitation of large language model via in-context self- | arXiv: 2508.08719
is the information bottleneck robust enough towards label-noise resistant inform | arXiv: 2512.10573
ISEAL: Encrypted Fingerprinting for Reliable LLM Ownership Verification | arXiv: 2511.08905
jodiffusion jointly diffusing image with pixel-level annotations for semantic se | arXiv: 2512.13014
Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction | arXiv: 2509.10798
judging by the rules compliance-aligned framework for modern slavery statement m | arXiv: 2511.07803
Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search | arXiv: 2509.09245
just few states are enough randomized sparse feedback for stability of dynamical | arXiv: 2511.13870
kernelized edge attention addressing semantic attention blurring in temporal gra | arXiv: 2602.00596
kinest a kinematics-guided spatiotemporal state space model for human motion tra | arXiv: 2512.16791
know your trajectory -- trustworthy reinforcement learning deployment through im | arXiv: 2512.06917
knowledge completes the vision a multimodal entity-aware retrieval-augmented gen | arXiv: 2511.21002
knowledge-guided masked autoencoder with linear spectral mixing and spectral-ang | arXiv: 2512.12445
ktcf actionable recourse in knowledge tracing via counterfactual explanations fo | arXiv: 2601.09156
KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache | arXiv: 2506.08018
L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention | arXiv: 2511.17910
laf-grpo in-situ navigation instruction generation for the visually impaired via | arXiv: 2506.04070
lamp learning universal adversarial perturbations for multi-image tasks via pre- | arXiv: 2601.21220
lampq towards accurate layer-wise mixed precision quantization for vision transf | arXiv: 2511.10004
language model distillation a temporal difference imitation learning perspective | arXiv: 2505.20335
Language Models and Logic Programs for Trustworthy Tax Reasoning | arXiv: 2508.21051
large language models meet extreme multi-label classification scaling and multi- | arXiv: 2511.13189
Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers | arXiv: 2511.07934
leanrag knowledge-graph-based generation with semantic aggregation and hierarchi | arXiv: 2508.10391
Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling | arXiv: 2511.21120v1
learning compact latent space for representing neural signed distance functions | arXiv: 2511.14539
learning conjugate direction fields for planar quadrilateral mesh generation | arXiv: 2511.11865
learning fair representations with kolmogorov-arnold networks | arXiv: 2511.11767
learning from the undesirable robust adaptation of language models without forge | arXiv: 2511.13052
learning network dismantling without handcrafted inputs | arXiv: 2508.00706
learning procedural-aware video representations through state-grounded hierarchy | arXiv: 2511.20073
learning spatial decay for vision transformers | arXiv: 2508.09525
learning subgroups with maximum treatment effects without causal heuristics | arXiv: 2511.20189
learning time in static classifiers | arXiv: 2511.12321
learning to collaborate an orchestrated-decentralized framework for peer-to-peer | arXiv: 2601.17133
learning to generate and extract a multi-agent collaboration framework for zero- | arXiv: 2603.02909
learning to tell apart weakly supervised video anomaly detection via disentangle | arXiv: 2511.10334
learning topology-driven multi-subspace fusion for grassmannian deep network | arXiv: 2511.08628
learning with preserving for continual multitask learning | arXiv: 2511.11676
length-adaptive interest network for balancing long and short sequence modeling | arXiv: 2601.19142
Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition | arXiv: 2512.17946
let the void be void robust open-set semi-supervised learning via selective non- | arXiv: 2504.12569
leveraging textual compositional reasoning for robust change captioning | arXiv: 2511.22903
lexchronos an agentic framework for structured event timeline extraction in indi | arXiv: 2603.01651
lidar-gsimproving lidar gaussian reconstruction via diffusion priors | arXiv: 2511.12304
lidarcrafter dynamic 4d world modeling from lidar sequences | arXiv: 2508.03692
liecraft a multi-agent framework for evaluating deceptive capabilities in langua | arXiv: 2603.06874
life machine learning and the search for habitability predicting biosignature fl | arXiv: 2601.12557
lifelong domain adaptive 3d human pose estimation | arXiv: 2512.23860
lightweight optimal-transport harmonization on edge devices | arXiv: 2511.12785
lilad learning in-context lyapunov-stable adaptive dynamics models | arXiv: 2511.21846
linext revisiting lidar completion with efficient non-diffusion architectures | arXiv: 2511.10209
listen like a teacher mitigating whisper hallucinations using adaptive layer att | arXiv: 2511.14219
listening between the frames bridging temporal gaps in large audio-language mode | arXiv: 2511.11039
livibench an omnimodal benchmark for interactive livestream video understanding | arXiv: 2601.15016
llandmark a multi-agent framework for landmark-aware multimodal interactive vide | arXiv: 2603.02888
llm targeted underperformance disproportionately impacts vulnerable users | arXiv: 2406.17737
llm-as-a-judge for scalable test coverage evaluation accuracy operational reliab | arXiv: 2512.01232
LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction | arXiv: 2512.18623
llmc benchmarking vision-language model compression with a plug-and-play toolkit | arXiv: 2508.09981
llms for game theory entropy-guided in-context learning and adaptive cot reasoni | arXiv: 2601.10775
llmtm benchmarking and optimizing llms for temporal motif analysis in dynamic gr | arXiv: 2512.22266
local guidance for configuration-based multi-agent pathfinding | arXiv: 2510.19072
logical characterizations of gnns with mean aggregation | arXiv: 2507.18145
loki low-damage knowledge implanting of large language models | arXiv: 2505.22120
longllada unlocking long context capabilities in diffusion llms | arXiv: 2506.14429
longt2ibench a benchmark for evaluating long text-to-image generation with graph | arXiv: 2512.09271
loom personalized learning informed by daily llm conversations toward long-term | arXiv: 2511.21037
loopllm transferable energy-latency attacks in llms via repetitive generation | arXiv: 2511.07876
loretta a low resource framework to poison continuous time dynamic graphs | arXiv: 2511.07379
loss-guided auxiliary agents for overcoming mode collapse in gflownets | arXiv: 2505.15251
lost in benchmarks rethinking large language model benchmarking with item respon | arXiv: 2505.15055
lost in time a meta-learning framework for time-shift-tolerant physiological sig | arXiv: 2511.21500
lost in translation a comparative study on the cross-lingual transfer of composi | arXiv: 2602.07963
low-rank curvature for zeroth-order optimization in llm fine-tuning | arXiv: 2511.07971
lucid learning-enabled uncertainty-aware certification of stochastic dynamical s | arXiv: 2512.11750
lungnoduleagent a collaborative multi-agent system for precision diagnosis of lu | arXiv: 2511.21042
lwganet addressing spatial and channel redundancy in remote sensing visual tasks | arXiv: 2501.10040
m2fmoe multi-resolution multi-view frequency mixture-of-experts for extreme-adap | arXiv: 2601.08631
m3sr multi-scale multi-perceptual mamba for efficient spectral reconstruction | arXiv: 2601.08293
Machine Learning for Sustainable Rice Production: Region-Scale Monitoring of Water-Saving Practices in Punjab, India | arXiv: 2507.08605
macprompt maraconic-guided jailbreak against text-to-image models | arXiv: 2601.07141
macs multi-source audio-to-image generation with contextual significance and sem | arXiv: 2503.10287
macvqa adaptive memory allocation and global noise filtering for continual visua | arXiv: 2601.01926
Magnitude Matters: A Superior Class of Similarity Metrics for Holistic Semantic Understanding | arXiv: 2509.19323
magnitude-modulated equivariant adapter for parameter-efficient fine-tuning of e | arXiv: 2511.06696
maisi-v2 accelerated 3d high-resolution medical image synthesis with rectified f | arXiv: 2508.05772
mama-memeia multi-aspect multi-agent collaboration for depressive symptoms ident | arXiv: 2512.25015
MambaMia: State-Space Hierarchical Compression for Hour-Long Video Understanding in Large Multimodal Models | arXiv: 2506.13564
MambaSeg: Harnessing Mamba for Accurate and Efficient Image-Event Semantic Segmentation | arXiv: 2512.24243
manilong-shot interaction-aware one-shot imitation learning for long-horizon man | arXiv: 2512.16302
mapi-gnn multi-activation plane interaction graph neural network for multimodal | arXiv: 2512.20026
MAPS: Multi-Agent Personality Shaping for Collaborative Reasoning | arXiv: 2503.16905
margin-aware preference optimization for aligning diffusion models without refer | arXiv: 2406.06424
mars a meta-adaptive reinforcement learning framework for risk-aware multi-agent | arXiv: 2508.01173
MARS: Multi-Agent Adaptive Reasoning with Socratic Guidance for Automated Prompt Optimization | arXiv: 2503.16874
mask the redundancy evolving masking representation learning for multivariate ti | arXiv: 2511.17008
mask2iv interaction-centric video generation via mask trajectories | arXiv: 2510.03135
mass concept erasure in diffusion models with concept hierarchy | arXiv: 2601.03305
mathsmith towards extremely hard mathematical reasoning by forging synthetic pro | arXiv: 2508.05592
matrix-free two-to-infinity and one-to-two norms estimation | arXiv: 2508.04444
mavis a benchmark for multimodal source attribution in long-form visual question | arXiv: 2511.12142
mcmoe completing missing modalities with mixture of experts for incomplete multi | arXiv: 2511.17397
mcts-sql light-weight llms can master the text-to-sql through monte carlo tree s | arXiv: 2501.16607
mctsr-zero self-reflective psychological counseling dialogues generation via pri | arXiv: 2505.23229
mdaif robust one-stop multi-degradation-aware image fusion with language-driven | arXiv: 2511.12525
mdiff4str mask diffusion model for scene text recognition | arXiv: 2512.01422
measuring model performance in the presence of an intervention | arXiv: 2511.05805
measuring stability beyond accuracy in small open-source medical large language | arXiv: 2601.11567
medeyes learning dynamic visual focus for medical progressive diagnosis | arXiv: 2511.22018
MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models | arXiv: 2509.23725
melodia training-free music editing guided by attention probing in diffusion mod | arXiv: 2511.08252
mem-pal towards memory-based personalized dialogue assistants for long-term user | arXiv: 2511.13410
mergedna context-aware genome modeling with dynamic tokenization through token m | arXiv: 2511.14806
MeSHA: Efficient Path Planning with Motion Primitives | arXiv: 2412.10320
meshsplat generalizable sparse-view surface reconstruction via gaussian splattin | arXiv: 2508.17811
meta dynamic graph for traffic flow prediction | arXiv: 2601.10328
metagdpo alleviating catastrophic forgetting with metacognitive knowledge throug | arXiv: 2511.12113
mf-speech achieving fine-grained and compositional control in speech generation | arXiv: 2511.12074
mfmamba a multi-function network for panchromatic image resolution restoration b | arXiv: 2511.18888
microevoeval a systematic evaluation framework for image-based microstructure ev | arXiv: 2511.08955
midb multilingual instruction data booster for enhancing cultural equality in mu | arXiv: 2505.17671
mindcross fast new subject adaptation with limited data for cross-subject video | arXiv: 2511.14196
mindvote when ai meets the wild west of social media opinion | arXiv: 2505.14422
minimizing inequity in facility location games | arXiv: 2602.01048
minimum-cost network flow with dual predictions | arXiv: 2601.20203
mirage scaling test-time inference with parallel graph-retrieval-augmented reaso | arXiv: 2508.18260
mirnet integrating constrained graph-based reasoning with pre-training for diagn | arXiv: 2511.10013
mitigating content effects on reasoning in language models through fine-grained | arXiv: 2505.12189
mitigating error accumulation in co-speech motion generation via global rotation | arXiv: 2511.10076
mixture of ranks with degradation-aware routing for one-step real-world image su | arXiv: 2511.16024
MMhops-R1: Multimodal Multi-hop Reasoning | arXiv: 2512.13573
mmpred radar-based human motion prediction in the dark | arXiv: 2512.00345
moba a material-oriented backdoor attack against lidar-based 3d object detection | arXiv: 2511.09999
mobgs motion deblurring dynamic 3d gaussian splatting for blurry monocular video | arXiv: 2504.15122
modality-aware bias mitigation and invariance learning for unsupervised visible- | arXiv: 2512.07760
model change for description logic concepts | arXiv: 2603.05562
model counting for dependency quantified boolean formulas | arXiv: 2511.07337
model editing as a double-edged sword steering agent ethical behavior toward ben | arXiv: 2506.20606
modelling the effects of hearing loss on neural coding in the auditory midbrain | arXiv: 2506.03088
moetta test-time adaptation under mixed distribution shifts with moe-layernorm | arXiv: 2511.13760
mofu scale-aware modulation and fourier fusion for multi-subject video generatio | arXiv: 2512.22310
monoclue object-aware clustering enhances monocular 3d object detection | arXiv: 2511.07862
moral change or noise on problems of aligning ai with temporally unstable human | arXiv: 2511.10032
moralreason generalizable moral decision alignment for llm agents using reasonin | arXiv: 2511.12271
more than irrational modeling belief-biased agents | arXiv: 2511.12359
mose hierarchical self-distillation enhances early layer embeddings | arXiv: 2503.03008
motif multi-strategy optimization via turn-based interactive framework | arXiv: 2508.03929
motioncharacter fine-grained motion controllable human video generation | arXiv: 2411.18281
motorec sparse-regularized multimodal tokenization for cold-start recommendation | arXiv: 2602.11062
movsemcl movement-semantics contrastive learning for trajectory similarity exten | arXiv: 2511.12061
mp1 meanflow tames policy learning in 1-step for robotic manipulation | arXiv: 2507.10543
mpa multimodal prototype augmentation for few-shot learning | arXiv: 2602.10143
mpd-sgr robust spiking neural networks with membrane potential distribution-driv | arXiv: 2511.12199
mr-cosmo visual-text memory recall and direct cross-modal alignment method for q | arXiv: 2506.20991
mug meta-path-aware universal heterogeneous graph pre-training | arXiv: 2602.22645
MUG: Multi-agent Undercover Gaming — Hallucination Removal via Counterfactual Test for Multimodal Reasoning | arXiv: 2511.11182
multi-agent vlms guided self-training with pnu loss for low-resource offensive c | arXiv: 2511.13759
multi-aspect cross-modal quantization for generative recommendation | arXiv: 2511.15122
multi-faceted attack exposing cross-model vulnerabilities in defense-equipped vi | arXiv: 2511.16110
multi-granularity interactive attention framework for residual hierarchical pron | arXiv: 2601.01745
multi-metric preference alignment for generative speech restoration | arXiv: 2508.17229
multi-modal assistance for unsupervised domain adaptation on point cloud 3d obje | arXiv: 2511.07966
multi-modal dynamic proxy learning for personalized multiple clustering | arXiv: 2511.07274
multigranular evaluation for brain visual decoding | arXiv: 2507.07993
multimodal data fusion to capture dynamic interactions between built environment | arXiv: 2601.11545
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework | arXiv: 2506.02454
multiplicative orthogonal sequential editing for language models | arXiv: 2601.07873
multitab a scalable foundation for multitask learning on tabular data | arXiv: 2511.09970
multivariate gaussian representation learning for medical action evaluation | arXiv: 2511.10060
mvgd-net a novel motion-aware video glass surface detection network | arXiv: 2601.13715
mygram modality-aware graph transformer with global distribution for multi-modal | arXiv: 2601.11885
n2n-gqa noise-to-narrative for graph-based table-text question answering using l | arXiv: 2601.06603
nadir differential attention flow for non-autoregressive transliteration in indi | arXiv: 2601.12389
neighbor-aware instance refining with noisy labels for cross-modal retrieval | arXiv: 2512.24064
nestr a neuro-symbolic abductive framework for temporal reasoning in large langu | arXiv: 2512.07218
neural bandit based optimal llm selection for a pipeline of tasks | arXiv: 2508.09958
Neural Graph Navigation for Intelligent Subgraph Matching | arXiv: 2511.17939
neurobridge bio-inspired self-supervised eeg-to-image decoding via cognitive pri | arXiv: 2511.06836
new synthetic goldmine hand joint angle-driven emg data generation framework for | arXiv: 2509.23359
no-regret strategy solving in imperfect-information games via pre-trained embedd | arXiv: 2511.12083
notam-evolve a knowledge-guided self-evolving optimization framework with llms f | arXiv: 2511.07982
note2chat improving llms for multi-turn clinical history taking using medical no | arXiv: 2601.21551
ntsformer a self-teaching graph transformer for multimodal isolated cold-start n | arXiv: 2507.04870
nurbgen high-fidelity text-to-cad generation through llm-driven nurbs modeling | arXiv: 2511.06194
nutriscreener retrieval-augmented multi-pose graph attention network for malnour | arXiv: 2511.16566
o3slm open weight open data and open vocabulary sketch-language model | arXiv: 2511.14368
oad-promoter enhancing zero-shot vqa using large language models with object att | arXiv: 2511.12131
object-centric latent action learning | arXiv: 2502.09680
object-centric world models for causality-aware reinforcement learning | arXiv: 2511.14262
oceansplat object-aware gaussian splatting with trinocular view consistency for | arXiv: 2601.04984
oida-qa a multimodal benchmark for analyzing the opioid industry documents archi | arXiv: 2511.09914
omnipt unleashing the potential of large vision language models for pedestrian t | arXiv: 2511.17053
omnivdiff omni controllable video diffusion for generation and understanding | arXiv: 2504.10825
on stealing graph neural network models | arXiv: 2511.07170
on the edge of core non-emptiness an automated reasoning approach to approval-ba | arXiv: 2512.16895
on the exponential convergence for offline rlhf with pairwise comparisons | arXiv: 2406.12205
on the information processing of one-dimensional wasserstein distances with fini | arXiv: 2511.12881
On the Learning Dynamics of Two-Layer Linear Networks with Label Noise SGD | arXiv: 2603.10397
on the variability of concept activation vectors | arXiv: 2509.24058
one-step generative policies with q-learning a reformulation of meanflow | arXiv: 2511.13035
online linear regression with paid stochastic features | arXiv: 2511.08073
open-world 3d scene graph generation for retrieval-augmented reasoning | arXiv: 2511.05894
open-world object counting in videos | arXiv: 2506.15368
openscan a benchmark for generalized open-vocabulary 3d scene understanding | arXiv: 2408.11030
opera a reinforcement learning--enhanced orchestrated planner-executor architect | arXiv: 2508.16438
opt3dgs optimizing 3d gaussian splatting with adaptive exploration and curvature | arXiv: 2511.13571
optimal look-back horizon for time series forecasting in federated learning | arXiv: 2511.12791
optimal welfare in noncooperative network formation under attack | arXiv: 2511.10845
optimized algorithms for text clustering with llm-generated constraints | arXiv: 2601.11118
optscale probabilistic optimality for inference-time scaling | arXiv: 2506.22376
or-r1 automating modeling and solving of operations research optimization proble | arXiv: 2511.09092
orvit near-optimal online distributionally robust reinforcement learning | arXiv: 2508.03768
otter mitigating background distractions of wide-angle few-shot action recogniti | arXiv: 2511.06741
pa-fas towards interpretable and generalizable multimodal face anti-spoofing via | arXiv: 2511.17927
padiff predictive and adaptive diffusion policies for ad hoc teamwork | arXiv: 2511.07260
pairing-free group-level knowledge distillation for robust gastrointestinal lesi | arXiv: 2601.09209
panda -- patch and distribution-aware augmentation for long-tailed exemplar-free | arXiv: 2511.09791
panda test-time adaptation with negative data augmentation | arXiv: 2511.10481
panfoma a lightweight foundation model and benchmark for pan-cancer | arXiv: 2512.03111
panonav mapless zero-shot object navigation with panoramic scene parsing and dyn | arXiv: 2511.06840
parallelism meets adaptiveness scalable documents understanding in multi-agent l | arXiv: 2507.17061
parameta towards learning disentangled paralinguistic speaking styles representa | arXiv: 2601.12289
parameter-free fine-tuning via redundancy elimination for vision foundation mode | arXiv: 2504.08915
parameterized approximation algorithms for tsp on non-metric graphs | arXiv: 2503.03642
parametric pareto set learning for expensive multi-objective optimization | arXiv: 2511.05815
parametrized multi-agent routing via deep attention models | arXiv: 2507.22338
pararevsnn a parallel reversible spiking neural network for efficient training a | arXiv: 2508.01223
pareto-grid-guided large language models for fast and high-quality heuristics de | arXiv: 2507.20923
paretohqd fast offline multiobjective alignment of large language models using p | arXiv: 2504.16628
partial action replacement tackling distribution shift in offline marl | arXiv: 2511.07629
partially shared concept bottleneck models | arXiv: 2511.22170
pase leveraging the phonological prior of wavlm for low-hallucination generative | arXiv: 2511.13300
pase prototype-aligned calibration and shapley-based equilibrium for multimodal | arXiv: 2511.17585
pathmind a retrieve-prioritize-reason framework for knowledge graph reasoning wi | arXiv: 2511.14256
patientvlm meets docvlm pre-consultation dialogue between vision-language models | arXiv: 2601.10945
pb4u-gnet resolution-adaptive garment simulation via propagation-before-update g | arXiv: 2601.15110
pcokg personality-aware commonsense reasoning with debate | arXiv: 2601.06234
peoat personalization-guided evolutionary question assembly for one-shot adaptiv | arXiv: 2512.00439
perceive act and correct confidence is not enough for hyperspectral classificati | arXiv: 2511.10068
persistent instability in llms personality measurements effects of scale reasoni | arXiv: 2508.04826
personality-guided public-private domain disentangled hypergraph-former network | arXiv: 2511.12460
personalization of large foundation models for health interventions | arXiv: 2601.03482
personalized federated learning with bidirectional communication compression via | arXiv: 2511.13144
perspective from a broader context can room style knowledge help visual floorpla | arXiv: 2508.01216
pertouch vlm-driven agent for personalized and semantic image retouching | arXiv: 2511.12998
perturb your data paraphrase-guided training data watermarking | arXiv: 2512.17075
perturbing best responses in zero-sum games | arXiv: 2511.12523
pet2rep towards vision-language model-drived automated radiology report generati | arXiv: 2508.04062
pfavatar pose-fusion 3d personalized avatar reconstruction from real-world outfi | arXiv: 2511.12935
phantom menace exploring and enhancing the robustness of vla models against phys | arXiv: 2511.10008
pharos-esg a framework for multimodal parsing contextual narration and hierarchi | arXiv: 2511.16417
phased one-step adversarial equilibrium for video diffusion models | arXiv: 2508.21019
phys-liquid a physics-informed dataset for estimating 3d geometry and volume of | arXiv: 2511.11077
physics-informed autonomous llm agents for explainable power electronics modulat | arXiv: 2411.14214
physics-informed deformable gaussian splatting towards unified constitutive laws | arXiv: 2511.06299
physicscorrect a training-free approach for stable neural pde simulations | arXiv: 2507.02227
pimrl physics-informed multi-scale recurrent learning for burst-sampled spatiote | arXiv: 2503.10253
pings-x physics-informed normalized gaussian splatting with axes alignment for e | arXiv: 2511.11048
piphen physical interaction prediction with hamiltonian energy networks | arXiv: 2511.16200
planttraitnet an uncertainty-aware multimodal framework for global-scale plant t | arXiv: 2511.06943
playmate2 training-free multi-character audio-driven animation via diffusion tra | arXiv: 2510.12089
plug-and-play clarifier a zero-shot multimodal framework for egocentric intent d | arXiv: 2511.08971
plug-and-play parameter-efficient tuning of embeddings for federated recommendat | arXiv: 2512.13734
plugtrack multi-perceptive motion analysis for adaptive fusion in multi-object t | arXiv: 2511.13105
pocketllm ultimate compression of large language models via meta networks | arXiv: 2511.17637
point cloud quantization through multimodal prompting for 3d understanding | arXiv: 2511.12079
point-sra self-representation alignment for 3d representation learning | arXiv: 2601.01746
position on llm-assisted peer review addressing reviewer gap through mentoring a | arXiv: 2601.09182
positional bias in multimodal embedding models do they favor the beginning the m | arXiv: 2511.11216
post training quantization for efficient dataset condensation | arXiv: 2603.13346
posterior label smoothing for node classification | arXiv: 2406.00410
pragworld a benchmark evaluating llms local world model under minimal linguistic | arXiv: 2511.13021
precise reducing the bias of llm evaluations using prediction-powered ranking es | arXiv: 2601.18777
predict and resist long-term accident anticipation under sensor noise | arXiv: 2511.08640
predicting the future by retrieving the past | arXiv: 2511.05859
predicting video slot attention queries from random slot-feature pairs | arXiv: 2508.01345
preference is more than comparisons rethinking dueling bandits with augmented hu | arXiv: 2511.09047
prefixgpt prefix adder optimization by a generative pre-trained transformer | arXiv: 2511.19472
presstrack-hmr pressure-based top-down multi-person global human mesh recovery | arXiv: 2511.09147
prime planning and retrieval-integrated memory for enhanced reasoning | arXiv: 2509.22315
principles2plan llm-guided system for operationalising ethical principles into p | arXiv: 2512.08536
PriorDrive: 用统一向量先验增强在线HD地图构建 | arXiv: 2409.05352
priorrg prior-guided contrastive pre-training and coarse-to-fine decoding for ch | arXiv: 2508.05353
prism privacy-aware routing for adaptive cloud-edge llm inference via semantic s | arXiv: 2511.22788
privacy auditing of multi-domain graph pre-trained model under membership infere | arXiv: 2511.17989
privacy on the fly a predictive adversarial transformation network for mobile se | arXiv: 2511.07242
privacy-protected retrieval-augmented generation for knowledge graph question an | arXiv: 2508.08785
private frequency estimation via residue number systems | arXiv: 2511.11569
probabilistic hash embeddings for online learning of categorical features | arXiv: 2511.20893
ProBench: Benchmarking GUI Agents with Accurate Process Information | arXiv: 2511.09157
probfm probabilistic time series foundation model with uncertainty decomposition | arXiv: 2601.10591
Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models | arXiv: 2511.12464
problog4fairness a neurosymbolic approach to modeling and mitigating bias | arXiv: 2511.09768
procache constraint-aware feature caching with selective computation for diffusi | arXiv: 2512.17298
profuser progressive fusion of large language models | arXiv: 2408.04998
promoting sustainable web agents benchmarking and estimating energy consumption | arXiv: 2511.04481
promptmoe generalizable zero-shot anomaly detection via visually-guided prompt m | arXiv: 2511.18116
propl universal semi-supervised ultrasound image segmentation via prompt-guided | arXiv: 2511.15057
prototype-based semantic consistency alignment for domain adaptive retrieval | arXiv: 2512.04524
ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Aligned Sparse Autoencoders | arXiv: 2509.05309
provably data-driven projection method for quadratic programming | arXiv: 2509.04524
provably efficient multi-objective bandit algorithms under preference-centric cu | arXiv: 2502.13457
provably minimum-length conformal prediction sets for ordinal classification | arXiv: 2511.16845
Prune4Web: DOM Tree Pruning Programming for Web Agent | arXiv: 2511.21398
psa-mf personality-sentiment aligned multi-level fusion for multimodal sentiment | arXiv: 2512.01442
psm prompt sensitivity minimization via llm-guided black-box optimization | arXiv: 2511.16209
pulsemind a multi-modal medical model for real-world clinical diagnosis | arXiv: 2601.07344
put the space of lora initialization to the extreme to preserve pre-trained know | arXiv: 2503.02659
q-fsru quantum-augmented frequency-spectral fusion for medical visual question a | arXiv: 2508.12036
qa-flora data-free query-adaptive fusion of loras for llms | arXiv: 2512.11366
qgshap quantum acceleration for faithful gnn explanations | arXiv: 2512.03099
qimeng-kernel macro-thinking micro-coding paradigm for llm-based high-performanc | arXiv: 2511.20100
quantifying conversational reliability of large language models under multi-turn | arXiv: 2603.01423
quantvsr low-bit post-training quantization for real-world video super-resolutio | arXiv: 2508.04485
quept quantized elastic precision transformers with one-shot calibration for mul | arXiv: 2602.12609
quiet feature learning in algorithmic tasks | arXiv: 2505.03997
r-avst empowering video-llms with fine-grained spatio-temporal reasoning in comp | arXiv: 2511.16901
racketvision a multiple racket sports benchmark for unified ball and racket anal | arXiv: 2511.17045
radar-aplanc unsupervised radar-based heartbeat sensing via augmented pseudo-lab | arXiv: 2511.08071
radarllm empowering large language models to understand human motion from millim | arXiv: 2504.09862
radarmp motion perception for 4d mmwave radar in autonomous driving | arXiv: 2511.12117
radiation-preserving selective imaging for pediatric hip dysplasia a cross-modal | arXiv: 2511.18457
ragfort dual-path defense against proprietary knowledge base extraction in retri | arXiv: 2511.10128
rast a retrieval augmented spatio-temporal framework for traffic prediction | arXiv: 2508.16623
rcae recursive reconstruction framework for unsupervised industrial anomaly dete | arXiv: 2512.11284
real-time 3d object detection with inference-aligned learning | arXiv: 2511.16140
real-time trust verification for safe agentic actions using trustbench | arXiv: 2603.09157
realign text-to-motion generation via step-aware reward-guided alignment | arXiv: 2511.19217
realism control one-step diffusion for real-world image super-resolution | arXiv: 2509.10122
realistic curriculum reinforcement learning for autonomous and sustainable marin | arXiv: 2601.10911
realistic face reconstruction from facial embeddings via diffusion models | arXiv: 2602.13168
realistic synthetic household data generation at scale | arXiv: 2602.07243
reap enhancing rag with recursive evaluation and adaptive planning for multi-hop | arXiv: 2511.09966
reason reinforced causal search with information bottleneck for video understand | arXiv: 2511.12530
reasoning about the unsaid misinformation detection with omission-aware graph in | arXiv: 2512.01728
reasoning or memorization unreliable results of reinforcement learning due to da | arXiv: 2507.10532
reasoning with exploration an entropy perspective | arXiv: 2506.14758
recad reinforcement learning enhanced parametric cad model generation with visio | arXiv: 2512.06328
recast reliability-aware codebook assisted lightweight time series forecasting | arXiv: 2511.11991
recode updating code api knowledge with reinforcement learning | arXiv: 2506.20495
recon-ipsundrum an inspectable recurrent persistence loop agent with affect-coup | arXiv: 2602.23232
Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts | arXiv: 2512.18718
rectified noise a generative model using positive-incentive noise | arXiv: 2511.07911
rectom a benchmark for evaluating machine theory of mind in llm-based conversati | arXiv: 2511.22275
recursive visual imagination and adaptive linguistic grounding for vision langua | arXiv: 2507.21450
reducing the scope of language models | arXiv: 2410.21597
redundant queries in detr-based 3d detection methods unnecessary and prunable | arXiv: 2412.02054
ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting | arXiv: 2603.01417
reference recommendation based membership inference attack against hybrid-based | arXiv: 2512.09442
refidiff progressive refinement diffusion for efficient missing data imputation | arXiv: 2505.14451
refine and align confidence calibration through multi-agent interaction in vqa | arXiv: 2511.11169
refinevad semantic-guided feature recalibration for weakly supervised video anom | arXiv: 2511.13204
reflection-driven control for trustworthy code agents | arXiv: 2512.21354
ReflexDiffusion: 反思增强的高侧向加速度自动驾驶轨迹规划 | arXiv: 2601.09377
regal a first look at ppo-based legal ai for judgment prediction and summarizati | arXiv: 2512.18014
regionmarker a region-triggered semantic watermarking framework for embedding-as | arXiv: 2511.13329
regular games -- an automata-based general game playing language | arXiv: 2511.10593
reimagining anomalies what if anomalies were normal | arXiv: 2402.14469
reina regularized entropy information-based loss for efficient simultaneous spee | arXiv: 2508.04946
reinforced rate control for neural video compression via inter-frame rate-distor | arXiv: 2601.19293
relactrl relevance-guided efficient control for diffusion transformers | arXiv: 2502.14377
relation-r1 progressively cognitive chain-of-thought guided reinforcement learni | arXiv: 2504.14642
Relink: Constructing Query-Driven Evidence Graph On-the-Fly for GraphRAG | arXiv: 2601.07192
remember me bridging the long-range gap in lvlms with three-step inference-only | arXiv: 2511.09868
RENEW: Risk- and Energy-Aware Navigation in Dynamic Waterways | arXiv: 2601.16424
renormalization group guided tensor network structure search | arXiv: 2512.24663
resource efficient sleep staging via multi-level masking and prompt learning | arXiv: 2511.06785
rethinking bias in generative data augmentation for medical ai a frequency recal | arXiv: 2511.12301
rethinking direct preference optimization in diffusion models | arXiv: 2505.18736
rethinking flow and diffusion bridge models for speech enhancement | arXiv: 2602.18355
rethinking long-tailed dataset distillation a uni-level framework with unbiased | arXiv: 2511.18858
rethinking multimodal point cloud completion a completion-by-correction perspect | arXiv: 2511.12170
rethinking progression of memory state in robotic manipulation an object-centric | arXiv: 2511.11478
rethinking rainy 3d scene reconstruction via perspective transforming and bright | arXiv: 2511.06734
rethinking surgical smoke a smoke-type-aware laparoscopic video desmoking method | arXiv: 2512.02780
rethinking target label conditioning in adversarial attacks a 2d tensor-guided g | arXiv: 2504.14137
rethinking the spatio-temporal alignment of end-to-end 3d perception | arXiv: 2512.23635
rethinking visual token reduction in lvlms under cross-modal misalignment | arXiv: 2506.22283
retrieving objects from 3d scenes with box-guided open-vocabulary instance segme | arXiv: 2512.19088
retrysql text-to-sql training with retry data for self-correcting query generati | arXiv: 2507.02529
revealing pomdps qualitative and quantitative analysis for parity objectives | arXiv: 2511.13134
revisiting the data sampling in multimodal post-training from a difficulty-disti | arXiv: 2511.06722
revisiting unfairness in recourse by minimizing worst-case social burden | arXiv: 2509.04128
revitalizing canonical pre-alignment for irregular multivariate time series fore | arXiv: 2508.01971
reward redistribution via gaussian process likelihood estimation | arXiv: 2503.17409
rexo indoor multi-view radar object detection via 3d bounding box diffusion | arXiv: 2511.17806
RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA | arXiv: 2512.15219
right looks wrong reasons compositional fidelity in text-to-image generation | arXiv: 2511.10136
risk-sensitive exponential actor critic | arXiv: 2602.07202
rlslm a hybrid reinforcement learning framework aligning rule-based social locom | arXiv: 2511.11323
RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models (Oral) | arXiv: 2512.06811
roadscenevqa benchmarking visual question answering in roadside perception syste | arXiv: 2511.18286
robust long-term test-time adaptation for 3d human pose estimation through motio | arXiv: 2511.18851
Robust Out-of-Order Retrieval for Grid-Based Storage at Maximum Capacity | arXiv: 2601.19144
robust tabular foundation models | arXiv: 2512.03307
robust watermarking on gradient boosting decision trees | arXiv: 2511.09822
rpm-mcts knowledge-retrieval as process reward model with monte carlo tree searc | arXiv: 2511.19895
rrra resampling and reranking through a retriever adapter | arXiv: 2508.11670
rs2-sam2 customized sam2 for referring remote sensing image segmentation | arXiv: 2503.07266
rsvg-zeroov exploring a training-free framework for zero-shot open-vocabulary vi | arXiv: 2509.18711
rtgaze real-time 3d-aware gaze redirection from a single image | arXiv: 2511.11289
S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning | arXiv: 2511.06727
s2drug bridging protein sequence and 3d structure in contrastive representation | arXiv: 2511.07006
s5 scalable semi-supervised semantic segmentation in remote sensing | arXiv: 2508.12409
safemil learning offline safe imitation policy from non-preferred trajectories | arXiv: 2511.08136
safenlidb a privacy-preserving safety alignment framework for llm-based natural | arXiv: 2511.06778
safer-clip mitigating nsfw content in vision-language models while preserving pr | arXiv: 2511.16743
safesieve from heuristics to experience in progressive pruning for llm-based mul | arXiv: 2508.11733
saga learning signal-aligned distributions for improved text-to-image generation | arXiv: 2508.13866
sage spuriousness-aware guided prompt exploration for mitigating multimodal bias | arXiv: 2511.13005
sam-daq segment anything model with depth-guided adaptive queries for rgb-d vide | arXiv: 2511.09870
sampling control for imbalanced calibration in semi-supervised learning | arXiv: 2511.18773
saot an enhanced locality-aware spectral transformer for solving pdes | arXiv: 2511.18777
SAPO: Self-Adaptive Process Optimization Makes Small Reasoners Stronger | arXiv: 2601.20312
saq-sam semantically-aligned quantization for segment anything model | arXiv: 2503.06515
satiredecoder visual cascaded decoupling for enhancing satirical image comprehen | arXiv: 2512.00582
satisficing and optimal generalised planning via goal regression extended versio | arXiv: 2511.11095
say more with less variable-frame-rate speech tokenization via adaptive clusteri | arXiv: 2509.04685
scalable and accurate graph reasoning with llm-based multi-agents | arXiv: 2410.05130
scalable multi-objective and meta reinforcement learning via gradient estimation | arXiv: 2511.12779
scalable vision-guided crop yield estimation | arXiv: 2511.12999
SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling | arXiv: 2512.00466
scaling and transferability of annealing strategies in large language model trai | arXiv: 2512.13705
scaling equitable reflection assessment in education via large language models a | arXiv: 2511.11772
scaling llm speculative decoding non-autoregressive forecasting in large-batch s | arXiv: 2511.20340
SceneJailEval: A Scenario-Adaptive Multi-Dimensional Framework for Jailbreak Evaluation | arXiv: 2508.06194
Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction | arXiv: 2602.18403
SCoPe: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs | arXiv: 2511.07001
SD-PSFNet: Sequential and Dynamic Point Spread Function Network for Image Deraining | arXiv: 2511.17993
sdeval safety dynamic evaluation for multimodal large language models | arXiv: 2508.06142
secmoe communication-efficient secure moe inference via select-then-compute | arXiv: 2601.06790
see symbolize act grounding vlms with spatial representations for better gamepla | arXiv: 2603.11601
seeing justice clearly handwritten legal document translation with ocr and visio | arXiv: 2512.18004
Seeing the Unseen: Zooming in the Dark with Event Cameras | arXiv: 2601.02206
segment and matte anything in a unified model | arXiv: 2601.12147
segment anything across shots a method and benchmark | arXiv: 2511.13715
SELDON: Supernova Explosions Learned by Deep ODE Networks | arXiv: 2603.04392
self-adaptive graph mixture of models | arXiv: 2511.13062
self-correction distillation for structured data question answering | arXiv: 2511.07998
self-npo data-free diffusion model enhancement via truncated diffusion fine-tuni | arXiv: 2505.11777
self-supervised inductive logic programming | arXiv: 2507.16405
self-supervised multiplex consensus mamba for general image fusion | arXiv: 2512.20921
semanticvla semantic-aligned sparsification and enhancement for efficient roboti | arXiv: 2511.10518
semc structure-enhanced mixture-of-experts contrastive learning for ultrasound s | arXiv: 2511.12559
semi-supervised high dynamic range image reconstructing via bi-level uncertain a | arXiv: 2511.12939
semi-supervised synthetic data generation with fine-grained relevance control fo | arXiv: 2509.16717
sentient detecting apts via capturing indirect dependencies and behavioral logic | arXiv: 2502.06521
serl self-examining reinforcement learning on open-domain | arXiv: 2511.07922
Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems | arXiv: 2511.18467
shapbpt image feature attributions using data-aware binary partition trees | arXiv: 2602.07047
share your attention transformer weight sharing via matrix-based dictionary lear | arXiv: 2508.04581
sharp eyes and memory for videollms information-aware visual token pruning for e | arXiv: 2511.08003
sheaf graph neural networks via pac-bayes spectral optimization | arXiv: 2508.00357
shortagesim simulating drug shortages under information asymmetry | arXiv: 2509.01813
shrinking the teacher an adaptive teaching paradigm for asymmetric eeg-vision al | arXiv: 2511.11422
sign schema-induced games for naming | arXiv: 2510.21855
sim-to-real an unsupervised noise layer for screen-camera watermarking robustnes | arXiv: 2504.18906
sim4seg boosting multimodal multi-disease medical diagnosis segmentation with re | arXiv: 2511.06665
simba towards high-fidelity and geometrically-consistent point cloud completion | arXiv: 2511.16161
simdiff simpler yet better diffusion model for time series point forecasting | arXiv: 2511.19256
simrod a simple baseline for raw object detection with global and local enhancem | arXiv: 2503.07101
Sketch-HARP: 分层自回归草图生成实现灵活笔画级绘制操控 | arXiv: 2511.07889
skill path unveiling language skills from circuit graphs | arXiv: 2410.01334
skipcat rank-maximized low-rank compression of large language models via shared | arXiv: 2512.13494
slidetailor personalized presentation slide generation for scientific papers | arXiv: 2512.20292
sm3det a unified model for multi-modal remote sensing object detection | arXiv: 2412.20665
small but mighty dynamic wavelet expert-guided fine-tuning of large-scale models | arXiv: 2601.09108
small language models for efficient agentic tool calling outperforming large mod | arXiv: 2512.15943
smart a surrogate model for predicting application runtime in dragonfly systems | arXiv: 2511.11111
smartsplat feature-smart gaussians for scalable compression of ultra-high-resolu | arXiv: 2512.20377
smofi step-wise momentum fusion for split federated learning on heterogeneous da | arXiv: 2511.09828
soft filtering guiding zero-shot composed image retrieval with prescriptive and | arXiv: 2512.20781
som directions are better than one multi-directional refusal suppression in lang | arXiv: 2511.08379
SoMe: A Realistic Benchmark for LLM-based Social Media Agents | arXiv: 2512.14720
sonnet spectral operator neural network for multivariable time series forecastin | arXiv: 2505.15312
soscontrol enhancing human motion generation through saliency-aware symbolic ori | arXiv: 2601.14258
spa achieving consensus in llm alignment via self-priority optimization | arXiv: 2511.06222
spacrd multimodal deep fusion of histology and spatial transcriptomics for cance | arXiv: 2603.06186
span benchmarking and improving cross-calendar temporal reasoning of large langu | arXiv: 2511.09993
SPARC: 用单一策略驾驶100辆未见车辆的OOD泛化 | arXiv: 2511.09737
spare single-pass annotation with reference-guided evaluation for automatic proc | arXiv: 2506.15498
spark query-aware unstructured sparsity with recoverable kv cache channel prunin | arXiv: 2508.15212
sparse additive model pruning for order-based causal structure learning | arXiv: 2602.15306
sparse4dgs 4d gaussian splatting for sparse-frame dynamic scene reconstruction | arXiv: 2511.07122
sparsecoop cooperative perception with kinematic-grounded queries | arXiv: 2512.06838
sparserm a lightweight preference modeling with sparse autoencoder | arXiv: 2511.07896
sparsesurf sparse-view 3d gaussian splatting for surface reconstruction | arXiv: 2511.14633
spatialactor exploring disentangled spatial representations for robust robotic m | arXiv: 2511.09555
spatio-temporal context learning with temporal difference convolution for moving | arXiv: 2511.09352
spatiotemporal difference network for video depth super-resolution | arXiv: 2508.01259
spatiotemporal-untrammelled mixture of experts for multi-person motion predictio | arXiv: 2512.21707
speakerlm end-to-end versatile speaker diarization and recognition with multimod | arXiv: 2508.06372
specdiff accelerating diffusion model inference with self-speculation | arXiv: 2509.13848
specquant spectral decomposition and adaptive truncation for ultra-low-bit llms | arXiv: 2511.11663
speculative sampling with reinforcement learning | arXiv: 2601.12212
spherediff tuning-free 360 static and dynamic panorama generation via spherical | arXiv: 2504.14396
spikcommander a high-performance spiking transformer with multi-view learning fo | arXiv: 2511.07883
spike imaging velocimetry dense motion estimation of fluids using spike cameras | arXiv: 2504.18864
spiking heterogeneous graph attention networks | arXiv: 2601.02401
spikingformer a key foundation model for spiking neural networks | arXiv: 2304.11954
splat-sap feed-forward gaussian splatting for human-centered scene with scale-aw | arXiv: 2511.22704
splats in splats robust and effective 3d steganography towards gaussian splattin | arXiv: 2412.03121
splatssc decoupled depth-guided gaussian splatting for semantic scene completion | arXiv: 2508.02261
split-layer enhancing implicit neural representation by maximizing the dimension | arXiv: 2511.10142
sproutbench a benchmark for safe and ethical large language models for youth | arXiv: 2508.11009
sr-ki scalable and real-time knowledge integration into llms via supervised atte | arXiv: 2511.06446
ssr semantic and spatial rectification for clip-based weakly supervised segmenta | arXiv: 2512.01701
stabilizing self-consuming diffusion models with latent space filtering | arXiv: 2511.12742
Stable Voting and the Splitting of Cycles | arXiv: 2512.00616
start small think big curriculum-based relative policy optimization for visual g | arXiv: 2511.13924
steering one-step diffusion model with fidelity-rich decoder for fast image comp | arXiv: 2508.04979
steering pretrained drafters during speculative decoding | arXiv: 2511.09844
stegavar privacy-preserving video action recognition via steganographic domain a | arXiv: 2512.12586
stelar-vision self-topology-aware efficient learning for aligned reasoning in vi | arXiv: 2508.08688
stellar scene text editor for low-resource languages and real-world data | arXiv: 2511.09977
stem efficient relative capability evaluation of llms through structured transit | arXiv: 2508.12096
stem faculty perspectives on generative ai in higher education | arXiv: 2603.04001
stepfun-formalizer unlocking the autoformalization potential of llms through kno | arXiv: 2508.04440
stmi segmentation-guided token modulation with cross-modal hypergraph interactio | arXiv: 2603.00695
stola self-adaptive touch-language framework with tactile commonsense reasoning | arXiv: 2505.04201
stratified knowledge-density super-network for scalable vision transformers | arXiv: 2511.11683
streaming generated gaussian process experts for online learning and control ext | arXiv: 2508.03679
streaming generation of co-speech gestures via accelerated rolling diffusion | arXiv: 2503.10488
streamstgs streaming spatial and temporal gaussian grids for real-time free-view | arXiv: 2511.06046
stride-qa visual question answering dataset for spatiotemporal reasoning in urba | arXiv: 2508.10427
structural approach to guiding a present-biased agent | arXiv: 2601.07763
structure-aware encodings of argumentation properties for clique-width | arXiv: 2511.10767
structure-based rna design by step-wise optimization of latent diffusion model | arXiv: 2601.19232
structured language generation model loss calibration and formatted decoding for | arXiv: 2402.08971
structured personalization modeling constraints as matroids for data-minimal llm | arXiv: 2512.11907
studying classifier-free guidance from a classifier-centric perspective | arXiv: 2503.10638
stylebreak revealing alignment vulnerabilities in large audio-language models vi | arXiv: 2511.10692
sugar learning skeleton representation with visual-motion knowledge for action r | arXiv: 2511.10091
surface-based visibility-guided uncertainty for continuous active 3d neural reco | arXiv: 2405.02568
svd-no learning pde solution operators with svd integral kernels | arXiv: 2511.10025
Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments | arXiv: 2509.01022
symmetrical flow matching unified image generation segmentation and classificati | arXiv: 2506.10634
synweather weather observation data synthesis across multiple regions and variab | arXiv: 2511.08291
t-lora single image diffusion model customization without overfitting | arXiv: 2507.05964
t-rex-omni integrating negative visual prompt in generic object detection | arXiv: 2511.08997
t2agent a tool-augmented multimodal misinformation detection agent with monte ca | arXiv: 2505.19768
t2i-riskyprompt a benchmark for safety evaluation attack and defense on text-to- | arXiv: 2510.22300
tab-pet graph-based positional encodings for tabular transformers | arXiv: 2511.13338
tabflash efficient table understanding with progressive question conditioning an | arXiv: 2511.13283
tackling resource-constrained and data-heterogeneity in federated learning with | arXiv: 2601.01840
tadarag task adaptive retrieval-augmented generation via on-the-fly knowledge gr | arXiv: 2511.12520
taligndiff automatic tooth alignment assisted by diffusion-based transformation | arXiv: 2508.04565
talk snap complain validation-aware multimodal expert framework for fine-grained | arXiv: 2511.14693
talksketch multimodal generative ai for real-time sketch ideation with speech | arXiv: 2511.05817
TAPA: Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments | arXiv: 2508.11425
target refocusing via attention redistribution for open-vocabulary semantic segm | arXiv: 2511.16170
targeted data protection for diffusion model by matching training trajectory | arXiv: 2512.10433
task aware modulation using representation learning for upsaling of terrestrial | arXiv: 2603.09974
Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data | arXiv: 2601.07474
task-aware retrieval augmentation for dynamic recommendation | arXiv: 2511.12495
task-specific distance correlation matching for few-shot action recognition | arXiv: 2512.11340
tawpipe topology-aware weight pipeline parallelism for accelerating long-context | arXiv: 2511.09741
TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models | arXiv: 2507.10643
tdsnns competitive topographic deep spiking neural networks for visual cortex mo | arXiv: 2508.04270
teaching large language models to maintain contextual faithfulness via synthetic | arXiv: 2505.16483
temple incentivizing temporal understanding of video large language models via p | arXiv: 2503.16929
temporal inconsistency guidance for super-resolution video quality assessment | arXiv: 2412.18933
temporal object-aware vision transformer for few-shot video object detection | arXiv: 2511.13784
Test-driven Reinforcement Learning in Continuous Control | arXiv: 2511.07904
test-time diverse reasoning by riemannian activation steering | arXiv: 2511.08305
text-guided channel perturbation and pretrained knowledge integration for unifie | arXiv: 2511.12432
text-guided controllable diffusion for realistic camouflage images generation | arXiv: 2511.20218
text-routed sparse mixture-of-experts model with explanation and temporal alignm | arXiv: 2512.22741
text-to-scene with large reasoning models | arXiv: 2509.26091
textshield-r1 reinforced reasoning for tampered text detection | arXiv: 2602.19828
tg-field geometry-aware radiative gaussian fields for tomographic reconstruction | arXiv: 2602.11705
tgdd trajectory guided dataset distillation with balanced distribution | arXiv: 2512.02469
the confidence trap gender bias and predictive certainty in llms | arXiv: 2601.07806
the curious case of analogies investigating analogical reasoning in large langua | arXiv: 2511.20344
the limitations and power of np-oracle-based functional synthesis techniques | arXiv: 2512.20572
the publication choice problem | arXiv: 2511.13678
the triangle of similarity a multi-faceted framework for comparing neural networ | arXiv: 2601.17093
theoretical and empirical analysis of lehmer codes to search permutation spaces | arXiv: 2511.19089
theory of mind for explainable human-robot interaction | arXiv: 2512.23482
think how your teammates think active inference can benefit decentralized execut | arXiv: 2511.18761
think speak decide language-augmented multi-agent reinforcement learning for eco | arXiv: 2511.12876
thinker training llms in hierarchical thinking for deep search via multi-turn in | arXiv: 2511.07943
thucy an llm-based multi-agent system for claim verification across relational d | arXiv: 2512.03278
time identity and consciousness in language model agents | arXiv: 2603.09043
timebill time-budgeted inference for large language models | arXiv: 2512.21859
tinychemvl advancing chemical vision-language models via efficient visual token | arXiv: 2511.06283
tmdc a two-stage modality denoising and complementation framework for multimodal | arXiv: 2511.10325
to align or not to align strategic multimodal representation alignment for optim | arXiv: 2511.12121
ToC: Tree-of-Claims Search with Multi-Agent Language Models | arXiv: 2511.16972
tofa training-free one-shot federated adaptation for vision-language models | arXiv: 2511.16423
tokenize once recommend anywhere unified item tokenization for multi-domain llm- | arXiv: 2511.12922
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents | arXiv: 2504.12679
tool4poi a tool-augmented llm framework for next poi recommendation | arXiv: 2511.06405
toporeformer mitigating adversarial attacks using topological purification in oc | arXiv: 2511.15807
tosc task-oriented shape completion for open-world dexterous grasp generation fr | arXiv: 2601.05499
touchformer a robust transformer-based framework for multimodal material percept | arXiv: 2511.19509
Toward Gaze Target Detection in Young Autistic Children | arXiv: 2511.11244
toward the frontiers of reliable diffusion sampling via adversarial sinkhorn att | arXiv: 2511.07499
towards 3d object-centric feature learning for semantic scene completion | arXiv: 2511.13031
towards a common framework for autoformalization | arXiv: 2509.09810
towards a foundation model for partial differential equations across physics dom | arXiv: 2511.21861
towards a rigorous understanding of the population dynamics of the nsga-iii tigh | arXiv: 2511.07125
towards affordance-aware robotic dexterous grasping with human-like priors | arXiv: 2508.08896
towards authentic movie dubbing with retrieve-augmented director-actor interacti | arXiv: 2511.14249
towards better code understanding in decoder-only models with contrastive learni | arXiv: 2406.12326
towards effective and efficient context-aware nucleus detection in histopatholog | arXiv: 2503.05678
towards effective stealthy and persistent backdoor attacks targeting graph found | arXiv: 2511.17982
towards human-ai accessibility mapping in india vlm-guided annotations and poi-c | arXiv: 2602.09216
Towards Inference-Time Scaling for Continuous Space Reasoning | arXiv: 2510.12167
towards llm-empowered knowledge tracing via llm-student hierarchical behavior al | arXiv: 2602.22879
towards long-window anchoring in vision-language model distillation | arXiv: 2512.21576
towards multiple missing values-resistant unsupervised graph anomaly detection | arXiv: 2511.09917
towards non-stationary time series forecasting with temporal stabilization and f | arXiv: 2511.08229
Towards Reinforcement Learning from Neural Feedback: Mapping fNIRS Signals to Agent Performance | arXiv: 2511.12844
towards scalable web accessibility audit with mllms as copilots | arXiv: 2511.03471
towards temporal fusion beyond the field of view for camera-based semantic scene | arXiv: 2511.12498
towards test-time efficient visual place recognition via asymmetric query proces | arXiv: 2512.13055
towards trustworthy multi-turn llm agents via behavioral guidance | arXiv: 2512.11421
towermind a tower defence game learning environment and benchmark for llm as age | arXiv: 2601.05899
trace a generalizable drift detector for streaming data-driven optimization | arXiv: 2512.07082
trace textual relevance augmentation and contextual encoding for multimodal hate | arXiv: 2504.17902
tracking and segmenting anything in any modality | arXiv: 2511.19475
tractable weighted first-order model counting with bounded treewidth binary evid | arXiv: 2511.09174
trade-offs in large reasoning models an empirical analysis of deliberative and a | arXiv: 2503.17979
training-free policy violation detection via activation-space whitening in llms | arXiv: 2512.03994
transferable backdoor attacks for code models via sharpness-aware adversarial pe | arXiv: 2602.11213
transferable hypergraph attack via injecting nodes into pivotal hyperedges | arXiv: 2511.10698
transmamba a sequence-level hybrid transformer-mamba language model | arXiv: 2503.24067
transparent networks for multivariate time series | arXiv: 2410.10535
travellama a multimodal travel assistant with large-scale dataset and structured | arXiv: 2504.16505
tri-bench stress-testing vlm reliability on spatial reasoning under camera tilt | arXiv: 2512.08860
trinitydna a bio-inspired foundational model for efficient long-sequence dna mod | arXiv: 2507.19229
truth justice and secrecy cake cutting under privacy constraints | arXiv: 2511.09882
truthfulrag resolving factual-level conflicts in retrieval-augmented generation | arXiv: 2511.10375
tsbow traffic surveillance benchmark for occluded vehicles under various weather | arXiv: 2602.05414
tsgdiff rethinking synthetic time series generation from a pure graph perspectiv | arXiv: 2511.12174
tspo temporal sampling policy optimization for long-form video language understa | arXiv: 2508.04369
TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models | arXiv: 2508.19257
tubermc tube-conditioned reconstruction with mutual constraints for weakly-super | arXiv: 2511.10241
uncertainty under the curve a sequence-level entropy area metric for reasoning l | arXiv: 2508.20384
uncovering bias paths with llm-guided causal discovery an active learning and dy | arXiv: 2506.12227
uncovering pretraining code in llms a syntax-aware attribution approach | arXiv: 2511.07033
uncovering zero-shot generalization gaps in time-series foundation models using | arXiv: 2509.26347
understanding dynamic scenes in ego centric 4d point clouds | arXiv: 2508.07251
understanding syllogistic reasoning in llms from formal and natural language per | arXiv: 2512.12620
uniabg unified adversarial view bridging and graph correspondence for unsupervis | arXiv: 2511.12054
unic-lift unified 3d instance segmentation via contrastive learning | arXiv: 2512.24763
unifit towards universal virtual try-on with mllm-guided semantic alignment | arXiv: 2511.15831
unihr hierarchical representation learning for unified knowledge graph link pred | arXiv: 2411.07019
Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation | arXiv: 2508.14031
universal safety controllers with learned prophecies | arXiv: 2511.11390
unleashing semantic and geometric priors for 3d scene completion | arXiv: 2508.13601
unleashing the potential of large language models for text-to-image generation t | arXiv: 2503.07334
unlocking efficient vehicle dynamics modeling via analytic world models | arXiv: 2502.10012
unseen enhancing dataset pruning from a generalization perspective | arXiv: 2511.12988
unsupervised feature selection through group discovery | arXiv: 2511.09166
unsupervised motion-compensated decomposition for cardiac mri reconstruction via | arXiv: 2511.11436
unsupervised multi-parameter inverse solving for reducing ring artifacts in 3d x | arXiv: 2412.05853
URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding | arXiv: 2511.10552
urban incident prediction with graph neural networks integrating government rati | arXiv: 2506.08740
urbannav learning language-guided urban navigation from web-scale human trajecto | arXiv: 2512.09607
use a unified model for universal sound separation and extraction | arXiv: 2512.21215
using certifying constraint solvers for generating step-wise explanations | arXiv: 2511.10428
uvlm benchmarking video language model for underwater world understanding | arXiv: 2507.02373
variance computation for weighted model counting with knowledge compilation appr | arXiv: 2601.03523
vascular anatomy-aware self-supervised pre-training for x-ray angiogram analysis | arXiv: 2602.11536
verb mirage unveiling and assessing verb concept hallucinations in multimodal la | arXiv: 2412.04939
verification-guided context optimization for tool calling via hierarchical llms- | arXiv: 2512.13860
vggt-dp generalizable robot control via vision foundation models | arXiv: 2509.18778
vidia2std a parallel corpus and methods for low-resource vietnamese dialect-to-s | arXiv: 2603.10211
VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness | arXiv: 2601.12672
VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use | arXiv: 2410.16400
vir-bench evaluating geospatial and temporal understanding of mllms via travel v | arXiv: 2509.19002
virtual multiplex staining for histological images using a marker-wise condition | arXiv: 2508.14681
vision transformers are circulant attention learners | arXiv: 2512.21542
vision-language reasoning for geolocalization a reinforcement learning approach | arXiv: 2601.00388
Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction (Oral) | arXiv: 2508.10936v2
vista scene-aware optimization for streaming video question answering under post | arXiv: 2602.08448
vitaldiagnosis ai-driven ecosystem for 247 vital monitoring and chronic disease | arXiv: 2601.15798
vk-det visual knowledge guided prototype learning for open-vocabulary aerial obj | arXiv: 2511.18075
vmfcoop towards equilibrium on a unified hyperspherical manifold for prompting b | arXiv: 2511.09540
voicecloak a multi-dimensional defense framework against unauthorized diffusion- | arXiv: 2505.12332
voices faces and feelings multi-modal emotion-cognition captioning for mental he | arXiv: 2603.01816
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models | arXiv: 2511.11438
vpho joint visual-physical cue learning and aggregation for hand-object pose est | arXiv: 2511.12030
vpn visual prompt navigation | arXiv: 2508.01766
vspo validating semantic pitfalls in ontology via llm-based cq generation | arXiv: 2511.07991
vtinker guided flow upsampling and texture mapping for high-resolution video fra | arXiv: 2511.16124
w2s-aligntree weak-to-strong inference-time alignment for large language models | arXiv: 2511.11518
walking further semantic-aware multimodal gait recognition under long-range cond | arXiv: 2603.14189
watermod modular token-rank partitioning for probability-balanced llm watermarki | arXiv: 2511.07863
Wavelet Enhanced Adaptive Frequency Filter for Sequential Recommendation | arXiv: 2511.07028
wdt-md wavelet diffusion transformers for microaneurysm detection in fundus imag | arXiv: 2511.08987
well begun half done reinforcement learning with prefix optimization for llm rea | arXiv: 2512.15274
when eyes and ears disagree can mllms discern audio-visual confusion | arXiv: 2511.10059
when hallucination costs millions benchmarking ai agents in high-stakes adversar | arXiv: 2510.00332
when human preferences flip an instance-dependent robust loss for rlhf | arXiv: 2512.00709
when person re-identification meets event camera a benchmark dataset and an attr | arXiv: 2507.13659
when refusals fail unstable safety mechanisms in long-context llm agents | arXiv: 2512.02445
when small models are right for wrong reasons process verification for trustwort | arXiv: 2601.00513
when top-ranked recommendations fail modeling multi-granular negative feedback f | arXiv: 2511.18700
when trackers date fish a benchmark and framework for underwater multiple fish t | arXiv: 2507.06400
where and what matters sensitivity-aware task vectors for many-shot multimodal i | arXiv: 2511.08246
where norms and references collide evaluating llms on normative reasoning | arXiv: 2602.02975
where to start alignment diffusion large language model may demand a distinct po | arXiv: 2508.12398
whispering agents an event-driven covert communication protocol for the internet | arXiv: 2508.02188
why do open-source llms struggle with data analysis a systematic empirical study | arXiv: 2506.19794
why isnt relational learning taking over the world | arXiv: 2507.13558
with great capabilities come great responsibilities introducing the agentic risk | arXiv: 2512.22211
worldrft latent world model planning with reinforcement fine-tuning for autonomo | arXiv: 2512.19133
x-mutest a multilingual benchmark for explainable hate speech detection and a no | arXiv: 2601.03194
x2edit revisiting arbitrary-instruction image editing through self-constructed d | arXiv: 2508.07607
xlinear a lightweight and accurate mlp-based model for long-term time series for | arXiv: 2601.09237
yes florence i will do better next time agentic feedback reasoning for humorous | arXiv: 2601.07232
yolo-iod towards real time incremental object detection | arXiv: 2512.22973
your ai-generated image detector can secretly achieve sota accuracy if calibrate | arXiv: 2602.01973
yours or mine overwriting attacks against neural audio watermarking | arXiv: 2509.05835
zero-reference joint low-light enhancement and deblurring via visual autoregress | arXiv: 2511.18591
cheating stereo matching in full-scale physical adversarial attack against binoc | arXiv: 2511.14386
monoclue object-aware clustering enhances monocular 3d object detection | arXiv: 2511.07862
an information theoretic evaluation metric for strong unlearning | arXiv: 2405.17878
comptrack information bottleneckguided lowrank dynamic token compres | arXiv: 2511.15580
priordrive enhancing online hd mapping with unified vector p | arXiv: 2409.05352
reflexdiffusion reflection-enhanced trajectory planning for | arXiv: 2601.09377
diffbench meets diffagent end-to-end llm-driven diffusion ac | arXiv: 2601.03178
equacode a multi-strategy jailbreak approach for large language models via equat | arXiv: 2512.23173
extracting events like code a multi-agent programming framework for zero-shot ev | arXiv: 2511.13118
mose hierarchical self-distillation enhances early layer embeddings | arXiv: 2503.03008
recode updating code api knowledge with reinforcement learning | arXiv: 2506.20495
span benchmarking and improving cross-calendar temporal reasoning of large langu | arXiv: 2511.09993
tapas are free training-free adaptation of programmatic agen | arXiv: 2508.11425
towards better code understanding in decoder-only large language models via hie | arXiv: 2406.12326
towards better code understanding in decoder-only models with contrastive learni | arXiv: 2406.12326
emergent persuasion will llms persuade without being prompted | arXiv: 2512.22201
mctsr-zero self-reflective psychological counseling dialogues generation via pri | arXiv: 2505.23229
teaching large language models to maintain contextual faithfulness via synthetic | arXiv: 2505.16483
as eastern powers i will veto an investigation of nation-level bias of large lan | arXiv: 2511.10695
beyond perplexity let the reader select retrieval summaries via spectrum project | arXiv: 2508.05909
cog-rag cognitive-inspired dual-hypergraph with theme alignment retrieval-augmen | arXiv: 2511.13201
comlq benchmarking complex logical queries in information retrieval | arXiv: 2511.12004
comorag a cognitive-inspired memory-organized rag for stateful long narrative re | arXiv: 2508.10419
convmix a mixed-criteria data augmentation framework for conversational dense re | arXiv: 2508.04001
do retrieval augmented language models know when they dont know | arXiv: 2509.01476
does less hallucination mean less creativity an empirical investigation in llms | arXiv: 2512.11509
exposing the cracks vulnerabilities of retrieval-augmented llm-based machine tra | arXiv: 2510.00829
himo-clip modeling semantic hierarchy and monotonicity in vi | arXiv: 2511.06653
knowledge completes the vision a multimodal entity-aware retrieval-augmented gen | arXiv: 2511.21002
llms for game theory entropy-guided in-context learning and adaptive cot reasoni | arXiv: 2601.10775
magnitude matters a superior class of similarity metrics for holistic semantic u | arXiv: 2509.19323
mavis a benchmark for multimodal source attribution in long-form visual question | arXiv: 2511.12142
mem-pal towards memory-based personalized dialogue assistants for long-term user | arXiv: 2511.13410
multimodal deepresearcher generating text-chart interleaved | arXiv: 2506.02454
n2n-gqa noise-to-narrative for graph-based table-text question answering using l | arXiv: 2601.06603
neighbor-aware instance refining with noisy labels for cross-modal retrieval | arXiv: 2512.24064
oad-promoter enhancing zero-shot vqa using large language models with object att | arXiv: 2511.12131
positional bias in multimodal embedding models do they favor the beginning the m | arXiv: 2511.11216
precise reducing the bias of llm evaluations using prediction-powered ranking es | arXiv: 2601.18777
prime planning and retrieval-integrated memory for enhanced reasoning | arXiv: 2509.22315
reap enhancing rag with recursive evaluation and adaptive planning for multi-hop | arXiv: 2511.09966
refeed retrieval feedback-guided dataset construction for style-aware query rewr | arXiv: 2603.01417
rrra resampling and reranking through a retriever adapter | arXiv: 2508.11670
sr-ki scalable and real-time knowledge integration into llms via supervised atte | arXiv: 2511.06446
towards inference-time scaling for continuous space reasoning | arXiv: 2510.12167
when small models are right for wrong reasons process verification for trustwort | arXiv: 2601.00513
a closer look at knowledge distillation in spiking neural ne | arXiv: 2511.06902
a coherence-based measure of agi | arXiv: 2510.20784
adaptive evidential learning for temporal-semantic robustnes | arXiv: 2512.00953v1
attention gathers mlps compose a causal analysis of an action-outcome circuit in | arXiv: 2603.11142
beyond hallucinations a composite score for measuring reliability in open-source | arXiv: 2512.24058
concepts from representations post-hoc concept bottleneck models via sparse deco | arXiv: 2601.12303
crosscheck-bench diagnosing compositional failures in multim | arXiv: 2511.21717
data whitening improves sparse autoencoder learning | arXiv: 2511.13981
distribution-based feature attribution for explaining the predictions of any cla | arXiv: 2511.09332
drexperts differential refinement of distortion-aware experts for blind image qu | arXiv: 2602.09531
elementarynet a non-strategic neural network for predicting human behavior in no | arXiv: 2503.05925
enhancing binary encoded crime linkage analysis using siamese network | arXiv: 2511.07651
explainable melanoma diagnosis with contrastive learning and llm-based report ge | arXiv: 2512.06105
finding the translation switch discovering and exploiting the task-initiation fe | arXiv: 2601.11019
finevau a novel human-aligned benchmark for fine-grained video anomaly understan | arXiv: 2601.17258
flashkat understanding and addressing performance bottlenecks in the kolmogorov- | arXiv: 2505.13813
flexible concept bottleneck model | arXiv: 2511.06678
fourierpet deep fourier-based unrolled network for low-count pet reconstruction | arXiv: 2601.11680
gatera token-aware modulation for parameter-efficient fine-tuning | arXiv: 2511.17582
genepheno interpretable gene knockout-induced phenotype abnormality prediction f | arXiv: 2511.09512
hskbenchmark modeling and benchmarking chinese second language acquisition in la | arXiv: 2511.15574
hypothesis generation via llm-automated language bias for ilp | arXiv: 2505.21486
imad intelligent multi-agent debate for efficient and accura | arXiv: 2511.11306
induce align predict zero-shot stance detection via cognitive inductive reasonin | arXiv: 2506.13470
llm circuit analyses consistent across training and scale | arXiv: 2407.10827
probing preference representations a multi-dimensional evaluation and analysis m | arXiv: 2511.12464
quiet feature learning in algorithmic tasks | arXiv: 2505.03997
scope intrinsic semantic space control for mitigating copyright infringement in | arXiv: 2511.07001
shapbpt image feature attributions using data-aware binary partition trees | arXiv: 2602.07047
som directions are better than one multi-directional refusal suppression in lang | arXiv: 2511.08379
spark query-aware unstructured sparsity with recoverable kv cache channel prunin | arXiv: 2508.15212
toc tree-of-claims search with multi-agent language models | arXiv: 2511.16972
universal safety controllers with learned prophecies | arXiv: 2511.11390
unsupervised feature selection through group discovery | arXiv: 2511.09166
using certifying constraint solvers for generating step-wise explanations | arXiv: 2511.10428
voices faces and feelings multi-modal emotion-cognition captioning for mental he | arXiv: 2603.01816
catastrophic forgetting in kolmogorov-arnold networks | arXiv: 2511.12828
hybrid-dmkg a hybrid reasoning framework over dynamic multimodal knowledge graph | arXiv: 2512.00881
is the information bottleneck robust enough towards label-noise resistant inform | arXiv: 2512.10573
model editing as a double-edged sword steering agent ethical behavior toward ben | arXiv: 2506.20606
multiplicative orthogonal sequential editing for language models | arXiv: 2601.07873
a multi-agent conversational bandit approach to online evaluation and selection | arXiv: 2501.01849
a multi-agent llm framework for multi-domain low-resource in-context ner via kno | arXiv: 2511.19083
autoglm autonomous foundation agents for guis | arXiv: 2411.00820
autotool efficient tool selection for large language model agents | arXiv: 2511.14650
gram-r2 self-training generative foundation reward models for reward reasoning | arXiv: 2509.02492
connectivity-guided sparsification of 2-fwl gnns preserving full expressivity wi | arXiv: 2511.12838
axis-aligned document dewarping | arXiv: 2507.15000
bcwildfire a long-term multi-factor dataset and deep learning benchmark for bore | arXiv: 2511.17597
benchmarking llms for political science a united nations perspective | arXiv: 2502.14122
beyond accuracy a cognitive load framework for mapping the c | arXiv: 2601.20412
beyond cosine similarity magnitude-aware clip for no-reference image quality ass | arXiv: 2511.09948
coninstruct evaluating large language models on conflict detection and resolutio | arXiv: 2511.14342
dcmatch unsupervised multi-shape matching with dual-level consistency | arXiv: 2509.01204
dicap distribution-calibrated pseudo-labeling for semi-supervised multi-label le | arXiv: 2511.20225
gdba revisited unleashing the power of guided local search for distributed const | arXiv: 2508.06899
gene incremental learning for single-cell transcriptomics | arXiv: 2511.13762
goal geometrically optimal alignment for continual generalized category discover | arXiv: 2602.19872
granalign granularity-aware alignment framework for zero-shot video moment retri | arXiv: 2601.00584
graph out-of-distribution detection via test-time calibration with dual dynamic | arXiv: 2511.13541
hybridla hybrid generation for document layout analysis | arXiv: 2511.19919
improved runtime guarantees for the spea2 multi-objective optimizer | arXiv: 2511.07150
llm-as-a-judge for scalable test coverage evaluation accuracy operational reliab | arXiv: 2512.01232
lost in benchmarks rethinking large language model benchmarking with item respon | arXiv: 2505.15055
low-rank curvature for zeroth-order optimization in llm fine-tuning | arXiv: 2511.07971
maps multi-agent personality shaping for collaborative reaso | arXiv: 2503.16905
mcts-sql light-weight llms can master the text-to-sql through monte carlo tree s | arXiv: 2501.16607
mindvote when ai meets the wild west of social media opinion | arXiv: 2505.14422
moetta test-time adaptation under mixed distribution shifts with moe-layernorm | arXiv: 2511.13760
nestr a neuro-symbolic abductive framework for temporal reasoning in large langu | arXiv: 2512.07218
optscale probabilistic optimality for inference-time scaling | arXiv: 2506.22376
perspective from a broader context can room style knowledge help visual floorpla | arXiv: 2508.01216
refinevad semantic-guided feature recalibration for weakly supervised video anom | arXiv: 2511.13204
regular games -- an automata-based general game playing language | arXiv: 2511.10593
sampling control for imbalanced calibration in semi-supervised learning | arXiv: 2511.18773
scalable vision-guided crop yield estimation | arXiv: 2511.12999
spikcommander a high-performance spiking transformer with multi-view learning fo | arXiv: 2511.07883
streaming generated gaussian process experts for online learning and control ext | arXiv: 2508.03679
structured language generation model loss calibration and formatted decoding for | arXiv: 2402.08971
where norms and references collide evaluating llms on normative reasoning | arXiv: 2602.02975
an invariant latent space perspective on language model inve | arXiv: 2511.19569v1
from classification to ranking enhancing llm reasoning capabilities for mbti per | arXiv: 2601.18582
persistent instability in llms personality measurements effects of scale reasoni | arXiv: 2508.04826
promptmoe generalizable zero-shot anomaly detection via visually-guided prompt m | arXiv: 2511.18116
smart a surrogate model for predicting application runtime in dragonfly systems | arXiv: 2511.11111
soft filtering guiding zero-shot composed image retrieval with prescriptive and | arXiv: 2512.20781
stem efficient relative capability evaluation of llms through structured transit | arXiv: 2508.12096
temple incentivizing temporal understanding of video large language models via p | arXiv: 2503.16929
elspr evaluator llm training data self-purification on non-transitive preference | arXiv: 2505.17691
learning time in static classifiers | arXiv: 2511.12321
no-regret strategy solving in imperfect-information games via pre-trained embedd | arXiv: 2511.12083
scaling and transferability of annealing strategies in large language model trai | arXiv: 2512.13705
uncovering pretraining code in llms a syntax-aware attribution approach | arXiv: 2511.07033
catformer when continual learning meets spiking transformers with dynamic thresh | arXiv: 2603.15184
designing truthful mechanisms for asymptotic fair division | arXiv: 2512.10892
hallucination stations on some basic limitations of transformer-based language m | arXiv: 2507.07505
llm targeted underperformance disproportionately impacts vulnerable users | arXiv: 2406.17737
panda -- patch and distribution-aware augmentation for long-tailed exemplar-free | arXiv: 2511.09791
a principle-driven adaptive policy for group cognitive stimu | arXiv: 2603.10034
dp-geng differentially private dataset distillation guided by dp-generated data | arXiv: 2511.09876
earth-adapter bridge the geospatial domain gaps with mixture of frequency adapta | arXiv: 2504.06220
infocom kilobyte-scale communication-efficient collaborative perception with inf | arXiv: 2512.10305
bridging the multilingual safety divide efficient culturally-aware alignment for | arXiv: 2602.13867
consensus-aligned neuron efficient fine-tuning large language models for multi-d | arXiv: 2602.05694
focusing on language revealing and exploiting language attention heads in multil | arXiv: 2511.07498
gloctm cross-lingual topic modeling via a global context space | arXiv: 2601.11872
how does alignment enhance llms multilingual capabilities a language neurons per | arXiv: 2505.21505
mitigating content effects on reasoning in language models through fine-grained | arXiv: 2505.12189
nadir differential attention flow for non-autoregressive transliteration in indi | arXiv: 2601.12389
stellar scene text editor for low-resource languages and real-world data | arXiv: 2511.09977
vidia2std a parallel corpus and methods for low-resource vietnamese dialect-to-s | arXiv: 2603.10211
x-mutest a multilingual benchmark for explainable hate speech detection and a no | arXiv: 2601.03194
patientvlm meets docvlm pre-consultation dialogue between vision language models | arXiv: 2601.10945
c3tg conflict-aware composite and collaborative controlled text generation | arXiv: 2511.09292
convex clustering redefined robust learning with higher order norms and beyond | arXiv: 2511.14784
a fast heuristic search approach for energy-optimal profile | arXiv: 2512.01331
a graph-theoretical perspective on law design for multiagent | arXiv: 2511.06361
a graph-theoretical perspective on law design for multiagent systems | arXiv: 2511.06361
a phase transition for opinion dynamics with competing biase | arXiv: 2511.09434
a topological rewriting of tarskis mereogeometry | arXiv: 2511.12727
a learning framework for cooperative collision avoidance of uav swarms leveragin | arXiv: 2507.10913
fedgrpo privately optimizing foundation models with group-relative rewards from | arXiv: 2602.12014
from pretrain to pain adversarial vulnerability of video foundation models witho | arXiv: 2511.07049
argumentative debates for transparent bias detection technic | arXiv: 2508.04511
beyond detection exploring evidence-based multi-agent debate for misinformation | arXiv: 2511.07267
cross-modal prompting for balanced incomplete multi-modal emotion recognition | arXiv: 2512.11239
fact2fiction targeted poisoning attack to agentic fact-check | arXiv: 2508.06059
factguard event-centric and commonsense-guided fake news detection | arXiv: 2511.10281
factorut controlling untrusted ai by monitoring their plans | arXiv: 2512.14745
multi-modal dynamic proxy learning for personalized multiple clustering | arXiv: 2511.07274
reasoning about the unsaid misinformation detection with omission-aware graph in | arXiv: 2512.01728
scenejaileval a scenario-adaptive multi-dimensional framework for jailbreak eval | arXiv: 2508.06194
t2agent a tool-augmented multimodal misinformation detection agent with monte ca | arXiv: 2505.19768
a unified shape-aware foundation model for time series class | arXiv: 2601.06429v1
3d4d an interactive editable 4d world model via 3d video generation | arXiv: 2511.08536
dreamrunner fine-grained compositional story-to-video genera | arXiv: 2411.16657
filmweaver weaving consistent multi-shot videos with cache-guided autoregressive | arXiv: 2512.11274
genvidbench a 6-million benchmark for ai-generated video detection | arXiv: 2501.11340
mask2iv interaction-centric video generation via mask trajectories | arXiv: 2510.03135
mofu scale-aware modulation and fourier fusion for multi-subject video generatio | arXiv: 2512.22310
motioncharacter fine-grained motion controllable human video generation | arXiv: 2411.18281
omnivdiff omni controllable video diffusion for generation and understanding | arXiv: 2504.10825
phased one-step adversarial equilibrium for video diffusion models | arXiv: 2508.21019
seeing the unseen zooming in the dark with event cameras | arXiv: 2601.02206
spherediff tuning-free 360 static and dynamic panorama generation via spherical | arXiv: 2504.14396