AAAI2026 论文笔记 TODO¶
总计: 1570 篇 | 已完成: 1570 | 待更新: 0
- 10 Open Challenges Steering the Future of Vision-Language-Action Models | arXiv: 2511.05936v1
- 3D-ANC: Adaptive Neural Collapse for Robust 3D Point Cloud Recognition | arXiv: 2511.07040
- 3d-free meets 3d priors novel view synthesis from a single image with pretrained | arXiv: 2408.06157
- 3d4d an interactive editable 4d world model via 3d video generation | arXiv: 2511.08536
- 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation | arXiv: 2512.11557
- 4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation | arXiv: 2511.07241
- A Closer Look at Knowledge Distillation in Spiking Neural Network Training | arXiv: 2511.06902
- A Coherence-Based Measure of AGI | arXiv: 2510.20784
- A Computable Game-Theoretic Framework for Multi-Agent Theory of Mind | arXiv: 2511.22536
- A Content-Preserving Secure Linguistic Steganography | arXiv: 2511.12565
- A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs | arXiv: 2505.23816
- a data-driven model predictive control framework for multi-aircraft tma routing | arXiv: 2511.19452
- A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation | arXiv: 2511.12259
- A Distributed Asynchronous Generalized Momentum Algorithm Without Delay Bounds | arXiv: 2508.08218
- A Fast Heuristic Search Approach for Energy-Optimal Profile Routing for Electric Vehicles | arXiv: 2512.01331
- A Graph-Theoretical Perspective on Law Design for Multiagent Systems | arXiv: 2511.06361
- A Learning Framework For Cooperative Collision Avoidance of UAV Swarms Leveraging Domain Knowledge | arXiv: 2507.10913
- a mind cannot be smeared across time | arXiv: 2601.11620
- A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses | arXiv: 2501.01849
- A Multi-Agent LLM Framework for Multi-Domain Low-Resource In-Context NER via Knowledge Retrieval, Disambiguation and Reflective Analysis | arXiv: 2511.19083
- a new strategy for verifying reach-avoid specifications in neural feedback syste | arXiv: 2601.08065
- A Phase Transition for Opinion Dynamics with Competing Biases | arXiv: 2511.09434
- A Principle-Driven Adaptive Policy for Group Cognitive Stimulation Dialogue for Elderly with Cognitive Impairment | arXiv: 2603.10034
- A Reasoning Paradigm for Named Entity Recognition | arXiv: 2511.11978
- a superpersuasive autonomous policy debating system | arXiv: 2511.17854
- A Switching Framework for Online Interval Scheduling with Predictions | arXiv: 2511.16194
- a theoretical analysis of detecting large model-generated time series | arXiv: 2511.07104
- A Topological Rewriting of Tarski's Mereogeometry | arXiv: 2511.12727
- A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication | arXiv: 2511.11560
- A Unified Shape-Aware Foundation Model for Time Series Classification | arXiv: 2601.06429v1
- A2Flow: Automating Agentic Workflow Generation via Self-Adaptive Abstraction Operators | arXiv: 2511.20693
- AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs | arXiv: 2601.02771v1
- ActiShade: Activating Overshadowed Knowledge to Guide Multi-Hop Reasoning in Large Language Models | arXiv: 2601.07260
- Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward | arXiv: 2508.11143
- AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization | arXiv: 2603.11873v1
- Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models | arXiv: 2511.15311v2
- Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval | arXiv: 2512.00953v1
- Adaptive Fidelity Estimation for Quantum Programs with Graph-Guided Noise Awareness | arXiv: 2601.14713v1
- adaptive initial residual connections for gnns with theoretical guarantees | arXiv: 2511.06598
- Adaptive Morph-Patch Transformer for Aortic Vessel Segmentation | arXiv: 2511.06897
- Adaptive Riemannian Graph Neural Networks | arXiv: 2508.02600
- Adaptive Theory of Mind for LLM-based Multi-Agent Coordination | arXiv: 2603.16264
- Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards | arXiv: 2506.14375v2
- AEDR: Training-Free AI-Generated Image Attribution via Autoencoder Double-Reconstruction | arXiv: 2507.18988v2
- AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios | arXiv: 2511.21053v2
- Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation | arXiv: 2511.06240
- Agent-SAMA: State-Aware Mobile Assistant | arXiv: 2505.23596v3
- AgentODRL: A Large Language Model-based Multi-agent System for ODRL Generation | arXiv: 2512.00602v1
- AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments | arXiv: 2506.11773v4
- agentswift efficient llm agent design via value-guided hierarchical search | arXiv: 2506.06017
- Aggregating Diverse Cue Experts for AI-Generated Image Detection | arXiv: 2601.08790v1
- AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions | arXiv: 2509.01787v3
- AHAN: Asymmetric Hierarchical Attention Network for Identical Twin Face Verification | arXiv: 2602.21503
- ai-based traffic modeling for network security and privacy challenges ahead | arXiv: 2503.22161
- airdde multifactor neural delay differential equations for air quality forecasti | arXiv: 2603.17529
- Align to Structure: Aligning Large Language Models with Structural Information | arXiv: 2504.03622v2
- Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration | arXiv: 2602.20104v1
- aligning generative music ai with human preferences methods and challenges | arXiv: 2511.15038
- Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping | arXiv: 2511.11551v3
- Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment | arXiv: 2603.05566v1
- AlignTree: Efficient Defense Against LLM Jailbreak Attacks | arXiv: 2511.12217v1
- Align³GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation | arXiv: 2511.11255v2
- ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs | arXiv: 2603.01792v1
- Alternative Fairness and Accuracy Optimization in Criminal Justice | arXiv: 2511.04505v4
- AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment | arXiv: 2511.09385v2
- Ambiguity-aware Truncated Flow Matching for Ambiguous Medical Image Segmentation | arXiv: 2511.06857v2
- AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design | arXiv: 2512.21613v1
- An Epistemic Perspective on Agent Awareness | arXiv: 2511.05977v1
- An Improved Privacy and Utility Analysis of Differentially Private SGD with Bounded Domain and Smooth Losses | arXiv: 2502.17772v4
- An Information Theoretic Evaluation Metric for Strong Unlearning | arXiv: 2405.17878v3
- An Invariant Latent Space Perspective on Language Model Inversion | arXiv: 2511.19569v1
- An LLM-Based Simulation Framework for Embodied Conversational Agents in Psychological Counseling | arXiv: 2410.22041v3
- an overall real-time mechanism for classification and quality evaluation of rice | arXiv: 2502.13764
- AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation | arXiv: 2511.11692v1
- AnchorHOI: Zero-shot Generation of 4D Human-Object Interaction via Anchor-based Prior Distillation | arXiv: 2512.14095v1
- Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks | arXiv: 2511.12985v2
- Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation | arXiv: 2601.09212v1
- AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer | arXiv: 2511.06687v1
- Answering the Unanswerable Is to Err Knowingly: Analyzing and Mitigating Abstention Failures in Large Reasoning Models | arXiv: 2508.18760v3
- Anti-adversarial Learning: Desensitizing Prompts for Large Language Models | arXiv: 2505.01273v2
- anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding | arXiv: 2506.00942v2
- Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models | arXiv: 2511.14559v1
- Approximation Algorithm for Constrained k-Center Clustering: A Local Search Approach | arXiv: 2601.11883
- APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval | arXiv: 2506.04953v3
- AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture | arXiv: 2511.15870v1
- Arbitrary-Scale 3D Gaussian Super-Resolution | arXiv: 2508.16467v2
- ARCANE: A Multi-Agent Framework for Interpretable and Configurable Alignment | arXiv: 2512.06196
- ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction | arXiv: 2511.12485
- Are Graph Transformers Necessary? Efficient Long-Range Message Passing with Fractal Nodes in MPNNs | arXiv: 2511.13010
- are we done yet a vision-based judge for autonomous task completion of computer | arXiv: 2511.20067
- Area-Optimal Control Strategies for Heterogeneous Multi-Agent Pursuit | arXiv: 2511.15036v2
- Argumentative Debates for Transparent Bias Detection (ABIDE) | arXiv: 2508.04511v2
- as eastern powers i will veto an investigation of nation-level bias of large lan | arXiv: 2511.10695
- Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation | arXiv: 2507.18224
- assessing llms for serendipity discovery in knowledge graphs a case for drug rep | arXiv: 2511.12472
- assist-3d adapted scene synthesis for class-agnostic 3d instance segmentation | arXiv: 2512.09364
- AStar: Boosting Multimodal Reasoning with Automated Structured Thinking | arXiv: 2502.02339
- asymmetric cross-modal knowledge distillation bridging modalities with weak sema | arXiv: 2511.08901
- Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning | arXiv: 2512.14709
- attention gathers mlps compose a causal analysis of an action-outcome circuit in | arXiv: 2603.11142
- attention retention for continual learning with vision transformers | arXiv: 2602.05454
- authority backdoor a certifiable backdoor mechanism for authoring dnns | arXiv: 2512.10600
- auto-pre an automatic and cost-efficient peer-review framework for language gene | arXiv: 2410.12265
- automaldesc large-scale script analysis for cyber threat research | arXiv: 2511.13333
- automated reproducibility has a problem statement problem | arXiv: 2601.04226
- automating complex document workflows via stepwise and rollback-enabled operatio | arXiv: 2512.04445
- autonomous concept drift threshold determination | arXiv: 2511.09953
- autopp towards automated product poster generation and optimization | arXiv: 2512.21921
- AutoTool: Efficient Tool Selection for Large Language Model Agents | arXiv: 2511.14650v1
- AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models | arXiv: 2511.11299
- axis-aligned document dewarping | arXiv: 2507.15000
- A²LC: Active and Automated Label Correction for Semantic Segmentation | arXiv: 2506.11599
- backdoor attacks on open vocabulary object detectors via multi-modal prompt tuni | arXiv: 2511.12735
- backdoors in conditional diffusion threats to responsible synthetic data pipelin | arXiv: 2507.04726
- badthink triggered overthinking attacks on chain-of-thought reasoning in large l | arXiv: 2511.10714
- baid a benchmark for bias assessment of ai detectors | arXiv: 2512.11505
- balancing multimodal domain generalization via gradient modulation and projectio | arXiv: 2603.14175
- BAMAS: Structuring Budget-Aware Multi-Agent Systems | arXiv: 2511.21572
- bandit learning in housing markets | arXiv: 2511.12629
- bat learning event-based optical flow with bidirectional adaptive temporal corre | arXiv: 2503.03256
- BayesAgent: Bayesian Agentic Reasoning Under Uncertainty via Verbalized Probabilistic Graphical Modeling | arXiv: 2406.05516
- bayesian meta-analyses could be more a case study in trial of labor after a cesa | arXiv: 2601.10089
- bayesian network structural consensus via greedy min-cut analysis | arXiv: 2504.00467
- bce3s binary cross-entropy based tripartite synergistic learning for long-tailed | arXiv: 2511.14097
- bcwildfire a long-term multi-factor dataset and deep learning benchmark for bore | arXiv: 2511.17597
- bd-net has depth-wise convolution ever been applied in binary neural networks | arXiv: 2511.17633
- beautiful images toxic words understanding and addressing offensive text in gene | arXiv: 2502.05066
- beerna tertiary structure-based rna inverse folding using artificial bee colony | arXiv: 2511.21781
- behavior tokens speak louder disentangled explainable recommendation with behavi | arXiv: 2512.15614
- behaviour policy optimization provably lower variance return estimates for off-p | arXiv: 2511.10843
- benchmarking llms for political science a united nations perspective | arXiv: 2502.14122
- Beta Distribution Learning for Reliable Roadway Crash Risk Assessment | arXiv: 2511.04886
- Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents | arXiv: 2601.20412
- Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection | arXiv: 2511.07301
- beyond cosine similarity magnitude-aware clip for no-reference image quality ass | arXiv: 2511.09948
- beyond detection exploring evidence-based multi-agent debate for misinformation | arXiv: 2511.07267
- beyond fact retrieval episodic memory for rag with generative semantic workspace | arXiv: 2511.07587
- beyond fixed depth adaptive graph neural networks for node classification under | arXiv: 2511.06608
- beyond hallucinations a composite score for measuring reliability in open-source | arXiv: 2512.24058
- beyond monotonicity revisiting factorization principles in multi-agent q-learnin | arXiv: 2511.09792
- beyond observations reconstruction error-guided irregularly sampled time series | arXiv: 2511.06854
- beyond perplexity let the reader select retrieval summaries via spectrum project | arXiv: 2508.05909
- Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning | arXiv: 2511.10037
- beyond semantic features pixel-level mapping for generalized ai-generated image | arXiv: 2512.17350
- beyond sharpness a flatness decomposition framework for efficient continual lear | arXiv: 2601.07636
- beyond superficial forgetting thorough unlearning through knowledge density esti | arXiv: 2511.11667
- beyond the lower bound bridging regret minimization and best arm identification | arXiv: 2511.05802
- beyond the mean fisher-orthogonal projection for natural gradient descent in lar | arXiv: 2508.13898
- beyond world models rethinking understanding in ai models | arXiv: 2511.12239
- bi-level contextual bandits for individualized resource allocation under delayed | arXiv: 2511.10572
- bias association discovery framework for open-ended llm generations | arXiv: 2508.01412
- biasjailbreakanalyzing ethical biases and jailbreak vulnerabilities in large lan | arXiv: 2410.13334
- bica effective biomedical dense retrieval with citation-aware hard negatives | arXiv: 2511.08029
- bid farewell to seesaw towards accurate long-tail session-based recommendation v | arXiv: 2511.08378
- bidirectional channel-selective semantic interaction for semi-supervised medical | arXiv: 2601.05855
- Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning | arXiv: 2508.08385
- bipartite mode matching for vision training set search from a hierarchical data | arXiv: 2601.09531
- biprompt bilateral prompt optimization for visual and textual debiasing in visio | arXiv: 2601.02147
- BLM-Guard: Explainable Multimodal Ad Moderation with Chain-of-Thought and Policy-Aligned Rewards | arXiv: 2602.18193
- blue teaming function-calling agents | arXiv: 2601.09292
- blur-robust detection via feature restoration an end-to-end framework for prior- | arXiv: 2511.14371
- BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning | arXiv: 2511.11421
- boosting adversarial transferability via ensemble non-attention | arXiv: 2511.08937
- Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models | arXiv: 2506.12409
- break the tie learning cluster-customized category relationships for categorical | arXiv: 2511.09049
- breaking the adversarial robustness-performance trade-off in text classification | arXiv: 2511.07888
- breaking the dyadic barrier rethinking fairness in link prediction beyond demogr | arXiv: 2511.06568
- breaking the modality barrier generative modeling for accurate molecule retrieva | arXiv: 2511.06259
- breaking the stealth-potency trade-off in clean-image backdoors with generative | arXiv: 2511.07210
- Bridging Day and Night: Target-Class Hallucination Suppression in Unpaired Image Translation | arXiv: 2602.15383
- bridging granularity gaps hierarchical semantic learning for cross-domain few-sh | arXiv: 2511.12200
- Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation (BriMPR) | arXiv: 2511.22862
- bridging synthetic and real routing problems via llm-guided instance generation | arXiv: 2511.10233
- Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content? | arXiv: 2512.21871
- bridging the multilingual safety divide efficient culturally-aware alignment for | arXiv: 2602.13867
- bridging the skills gap a course model for modern generative ai education | arXiv: 2511.11757
- bridging vision and language for robust context-aware surgical point tracking th | arXiv: 2511.12026
- bugsweeper function-level detection of smart contract vulnerabilities using grap | arXiv: 2512.09385
- c3rl rethinking the combination of channel-independence and channel-mixing from | arXiv: 2507.17454
- c3tg conflict-aware composite and collaborative controlled text generation | arXiv: 2511.09292
- cad-vae leveraging correlation-aware latents for comprehensive fair disentanglem | arXiv: 2503.07938
- CAE: Hierarchical Semantic Alignment for Image Clustering | arXiv: 2512.00904
- CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis | arXiv: 2508.02322
- can editing llms inject harm | arXiv: 2407.20224
- can llms truly embody human personality analyzing ai and human behavior alignmen | arXiv: 2602.07414
- can protective watermarking safeguard the copyright of 3d gaussian splatting | arXiv: 2511.22262
- can you tell the difference contrastive explanations for abox entailments | arXiv: 2511.11281
- cash flow underwriting with bank transaction data advancing msme financial inclu | arXiv: 2510.16066
- casl curvature-augmented self-supervised learning for 3d anomaly detection | arXiv: 2511.12909
- cat-net a cross-attention tone network for cross-subject eeg-emg fusion tone dec | arXiv: 2511.10935
- catastrophic forgetting in kolmogorov-arnold networks | arXiv: 2511.12828
- catformer causal temporal transformer with dynamic contextual fusion for driving | arXiv: 2507.13425
- catformer when continual learning meets spiking transformers with dynamic thresh | arXiv: 2603.15184
- Causal Inference Under Threshold Manipulation: Bayesian Mixture Modeling and Heterogeneous Treatment Effects | arXiv: 2509.19814
- causal structure learning for dynamical systems with theoretical score analysis | arXiv: 2512.14361
- Causal-Tune: Mining Causal Factors from Vision Foundation Models for Domain Generalized Semantic Segmentation | arXiv: 2512.16567
- causalclip causally-informed feature disentanglement and filtering for generaliz | arXiv: 2512.13285
- causality matters how temporal information emerges in video language models | arXiv: 2508.11576
- Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs | arXiv: 2511.09018
- causaltrace a neurosymbolic causal analysis agent for smart manufacturing | arXiv: 2510.12033
- ccfqa a benchmark for cross-lingual and cross-modal speech and text factuality e | arXiv: 2508.07295
- cd-dpe dual-prompt expert network based on convolutional dictionary feature deco | arXiv: 2511.14014
- cellstream dynamical optimal transport informed embeddings for reconstructing ce | arXiv: 2511.13786
- center-outward q-dominance a sample-computable proxy for strong stochastic domin | arXiv: 2511.12545
- certified branch-and-bound maxsat solving extended version | arXiv: 2511.10273
- certified but fooled breaking certified defences with ghost certificates | arXiv: 2511.14003
- chain-of-thought driven adversarial scenario extrapolation for robust language m | arXiv: 2505.17089
- characterizing ai manipulation risks in brazilian youtube climate discourse | arXiv: 2511.06091
- charteditor a reinforcement learning framework for robust chart editing | arXiv: 2511.15266
- chatsparent an interactive system for detecting and mitigating cognitive fatigue | arXiv: 2601.11526
- chdp cooperative hybrid diffusion policies for reinforcement learning in paramet | arXiv: 2601.05675
- cheating stereo matching in full-scale physical adversarial attack against binoc | arXiv: 2511.14386
- class-partitioned vq-vae and latent flow matching for point cloud scene generati | arXiv: 2601.12391
- Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration | arXiv: 2505.16479
- clearair a human-visual-perception-inspired all-in-one image restoration | arXiv: 2601.02763
- clicare grounding large language models in clinical guidelines for decision supp | arXiv: 2507.22533
- clinician-in-the-loop smart home system to detect urinary tract infection flare- | arXiv: 2511.18334
- clip-fti fine-grained face template inversion via clip-driven attribute conditio | arXiv: 2512.15433
- clippan adapting clip as a supervisor for unsupervised pansharpening | arXiv: 2511.10896
- CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation | arXiv: 2503.05255
- Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents | arXiv: 2511.10705
- co-layout llm-driven co-optimization for interior layout | arXiv: 2511.12474
- coach collaborative agents for contextual highlighting -- a multi-agent framewor | arXiv: 2512.01853
- coarse-to-fine open-set graph node classification with large language models | arXiv: 2512.16244
- cocolit controlnet-conditioned latent image translation for mri to amyloid pet s | arXiv: 2508.01292
- coevo continual evolution of symbolic solutions using large language models | arXiv: 2412.18890
- cog-rag cognitive-inspired dual-hypergraph with theme alignment retrieval-augmen | arXiv: 2511.13201
- coherent multi-agent trajectory forecasting in team sports with causaltraj | arXiv: 2511.18248
- collaborative llm numerical reasoning with local data protection | arXiv: 2504.00299
- cometnet contextual motif-guided long-term time series forecasting | arXiv: 2511.08049
- comlq benchmarking complex logical queries in information retrieval | arXiv: 2511.12004
- commonality in few few-shot multimodal anomaly detection via hypergraph-enhanced | arXiv: 2511.05966
- comorag a cognitive-inspired memory-organized rag for stateful long narrative re | arXiv: 2508.10419
- compensating distribution drifts in class-incremental learning of pre-trained vi | arXiv: 2511.09926
- CompTrack: 信息瓶颈引导的低秩动态Token压缩用于点云跟踪 (Oral) | arXiv: 2511.15580v3
- Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models | arXiv: 2511.11751
- concepts from representations post-hoc concept bottleneck models via sparse deco | arXiv: 2601.12303
- condensed data expansion using model inversion for knowledge distillation | arXiv: 2408.13850
- Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition | arXiv: 2511.13137
- conditional information bottleneck for multimodal fusion overcoming shortcut lea | arXiv: 2508.10644
- coninstruct evaluating large language models on conflict detection and resolutio | arXiv: 2511.14342
- Connecting the Dots: Training-Free Visual Grounding via Agentic Reasoning | arXiv: 2511.19516
- connectivity-guided sparsification of 2-fwl gnns preserving full expressivity wi | arXiv: 2511.12838
- consensus-aligned neuron efficient fine-tuning large language models for multi-d | arXiv: 2602.05694
- Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments | arXiv: 2505.19361
- constrained and robust policy synthesis with satisfiability-modulo-probabilistic | arXiv: 2511.08078
- constrained best arm identification with tests for feasibility | arXiv: 2511.09808
- constrained particle seeking solving diffusion inverse problems with just forwar | arXiv: 2603.01837
- consurv multimodal continual learning for survival analysis | arXiv: 2511.09853
- continuous degradation modeling via latent flow matching for real-world super-re | arXiv: 2602.04193
- Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning | arXiv: 2511.14396
- control illusion the failure of instruction hierarchies in large language models | arXiv: 2502.15851
- controllable financial market generation with diffusion guided meta agent | arXiv: 2408.12991
- conversational learning diagnosis via reasoning multi-turn interactive learning | arXiv: 2603.03236
- convex clustering redefined robust learning with the median of means estimator | arXiv: 2511.14784
- convmix a mixed-criteria data augmentation framework for conversational dense re | arXiv: 2508.04001
- Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution | arXiv: 2511.19430v1
- coordar one-reference 6d pose estimation of novel objects via autoregressive coo | arXiv: 2511.12919
- coordinated humanoid robot locomotion with symmetry equivariant reinforcement le | arXiv: 2508.01247
- copyright infringement detection in text-to-image diffusion models via different | arXiv: 2509.23022
- core-fed bridging collaborative and representation fairness via federated embedd | arXiv: 2602.00647
- correcting false alarms from unseen adapting graph anomaly detectors at test tim | arXiv: 2511.07023
- cost-free neutrality for the river method | arXiv: 2512.14409
- cost-minimized label-flipping poisoning attack to llm alignment | arXiv: 2511.09105
- counterfactual explainable ai xai method for deep learning-based multivariate ti | arXiv: 2511.13237
- countsteer steering attention for object counting in diffusion models | arXiv: 2511.11253
- COVR: Collaborative Optimization of VLMs and RL Agent for Visual-Based Control | arXiv: 2601.06122
- creating blank canvas against ai-enabled image forgery | arXiv: 2511.22237
- crebench human-aligned creativity evaluation from idea to process to product | arXiv: 2511.13626
- credal ensemble distillation for uncertainty quantification | arXiv: 2511.13766
- crops improving dense retrieval with cross-perspective positive samples in short | arXiv: 2511.15443
- cross modal fine-grained alignment via granularity-aware and region-uncertain mo | arXiv: 2511.07710
- cross-modal prompting for balanced incomplete multi-modal emotion recognition | arXiv: 2512.11239
- Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models | arXiv: 2601.08476
- Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models | arXiv: 2511.06793
- cross-sample augmented test-time adaptation for personalized intraoperative hypo | arXiv: 2512.15762
- cross-space synergy a unified framework for multimodal emotion recognition in co | arXiv: 2512.03521
- CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution | arXiv: 2511.21717
- CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models | arXiv: 2511.12263
- ctpd cross tokenizer preference distillation | arXiv: 2601.11865
- ctrlfuse mask-prompt guided controllable infrared and visible image fusion | arXiv: 2601.08619
- D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies | arXiv: 2511.16590
- dance density-agnostic and class-aware network for point cloud completion | arXiv: 2511.07978
- dapointmamba domain adaptive point mamba for point cloud completion | arXiv: 2511.20278
- data complexity of querying description logic knowledge bases under cost-based s | arXiv: 2511.07095
- data heterogeneity and forgotten labels in split federated learning | arXiv: 2511.09736
- data verification is the future of quantum computing copilots | arXiv: 2602.04072
- data whitening improves sparse autoencoder learning | arXiv: 2511.13981
- dcmatch unsupervised multi-shape matching with dual-level consistency | arXiv: 2509.01204
- deadline-aware energy-efficient control of domestic immersion hot water heater | arXiv: 2601.18123
- debiased dual-invariant defense for adversarially robust person re-identificatio | arXiv: 2511.09933
- debiasing diffusion priors via 3d attention for consistent gaussian splatting | arXiv: 2512.07345
- Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data | arXiv: 2508.01341
- decoding with structured awareness integrating directional frequency-spatial and | arXiv: 2512.05494
- decomposition and preprocessing of ternary constraint networks | arXiv: 2511.11872
- decor deep embedding clustering with orientation robustness | arXiv: 2510.03328
- DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF | arXiv: 2511.19097
- decoupling scene perception and ego status a multi-context fusion approach for e | arXiv: 2511.13079
- Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning | arXiv: 2507.10007
- deep incomplete multi-view clustering via hierarchical imputation and alignment | arXiv: 2601.09051
- deep predictive discounted counterfactual regret minimization | arXiv: 2511.08174
- deepboots dual-stream residual boosting for drift-resilient time-series forecast | arXiv: 2511.06893
- deepgb-tb a risk-balanced cross-attention gradient-boosted convolutional network | arXiv: 2508.02741
- deepprooflog efficient proving in deep stochastic logic programs | arXiv: 2511.08581
- deepraht learning predictive raht for point cloud attribute compression | arXiv: 2601.12255
- deeprwcap neural-guided random-walk capacitance solver for ic design | arXiv: 2511.06831
- deeptracer tracing stolen model via deep coupled watermarks | arXiv: 2511.08985
- deformtrace a deformable state space model with relay tokens for temporal forger | arXiv: 2603.04882
- deig detail-enhanced instance generation with fine-grained semantic control | arXiv: 2602.18282
- democratizing llm efficiency from hyperscale optimizations to universal deployab | arXiv: 2511.20662
- denas-vit data efficient nas-optimized vision transformer for ultrasound image s | arXiv: 2407.04203
- DEPO: Dual-Efficiency Preference Optimization for LLM Agents | arXiv: 2511.15392
- depth-synergized mamba meets memory experts for all-day image reflection separat | arXiv: 2601.00322
- description logics with two types of definite descriptions complexity expressive | arXiv: 2512.06604
- designing incident reporting systems for harms from general-purpose ai | arXiv: 2511.05914
- designing truthful mechanisms for asymptotic fair division | arXiv: 2512.10892
- detect all-type deepfake audio wavelet prompt tuning for enhanced auditory perce | arXiv: 2504.06753
- detecting the future all-at-once event sequence forecasting with horizon matchin | arXiv: 2408.13131
- detonation decoupled torch network-aware training on interlinked online nodes | arXiv: 2502.06728
- deviation dynamics in cardinal hedonic games | arXiv: 2511.11531
- dexterous manipulation transfer via progressive kinematic-dynamic alignment | arXiv: 2511.10987
- dfdt dynamic fast decision tree for iot data stream mining on edge devices | arXiv: 2502.14011
- dia-gnostic vlvae disentangled alignment-constrained vision language variational | arXiv: 2511.05968
- dicap distribution-calibrated pseudo-labeling for semi-supervised multi-label le | arXiv: 2511.20225
- DICE: Distilling Classifier-Free Guidance into Text Embeddings | arXiv: 2502.03726
- diff-v2m a hierarchical conditional diffusion model with explicit rhythmic model | arXiv: 2511.09090
- diffa large language diffusion models can listen and understand | arXiv: 2507.18452
- DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation | arXiv: 2601.03178
- Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models | arXiv: 2511.09973
- differentiable semantic meta-learning framework for long-tail motion forecasting | arXiv: 2511.06649
- differentiated directional intervention a framework for evading llm safety align | arXiv: 2511.06852
- Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data | arXiv: 2411.18109
- difficulty-aware label-guided denoising for monocular 3d object detection | arXiv: 2511.13195
- diffmm efficient method for accurate noisy and sparse trajectory map matching vi | arXiv: 2601.08482
- diffop reinforcement learning of optimization-based control policies via implici | arXiv: 2411.07484
- diffrefiner coarse to fine trajectory planning via diffusion refinement with sem | arXiv: 2511.17150
- diffusion reconstruction-based data likelihood estimation for core-set selection | arXiv: 2511.19274
- discode distribution-aware score decoder for robust automatic evaluation of imag | arXiv: 2512.14420
- discounted cuts a stackelberg approach to network disruption | arXiv: 2511.10804
- Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers | arXiv: 2511.06848
- Distilling Cross-Modal Knowledge via Feature Disentanglement | arXiv: 2511.19887
- distilling deep reinforcement learning into interpretable fuzzy rules an explain | arXiv: 2603.13257
- distilling future temporal knowledge with masked feature reconstruction for 3d o | arXiv: 2512.08247
- distribution-based feature attribution for explaining the predictions of any cla | arXiv: 2511.09332
- distributional priors guided diffusion for generating 3d molecules in low data r | arXiv: 2404.00962
- distributionally robust online markov game with linear function approximation | arXiv: 2511.07831
- diversifying counterattacks orthogonal exploration for robust clip inference | arXiv: 2511.09064
- divide conquer and unite hierarchical style-recalibrated prototype alignment for | arXiv: 2511.10945
- do it for her first-order temporal logic reward specification in reinforcement l | arXiv: 2602.06227
- do large language models think like the brain sentence-level evidences from laye | arXiv: 2505.22563
- do llms feel teaching emotion recognition with prompts retrieval and curriculum | arXiv: 2511.07061
- do llms really struggle at nl-fol translation revealing their strengths via a no | arXiv: 2511.11816
- do not merge my model safeguarding open-source llms against unauthorized model m | arXiv: 2511.10712
- do retrieval augmented language models know when they dont know | arXiv: 2509.01476
- do we need perfect data leveraging noise for domain generalized segmentation | arXiv: 2511.22948
- does less hallucination mean less creativity an empirical investigation in llms | arXiv: 2512.11509
- Does Self-Evaluation Enable Wireheading in Language Models? | arXiv: 2511.23092
- dogfit domain-guided fine-tuning for efficient transfer learning of diffusion mo | arXiv: 2508.05685
- domain generalized stereo matching with uncertainty-guided data augmentation | arXiv: 2508.01303
- dont start over a cost-effective framework for migrating personalized prompts be | arXiv: 2601.12034
- dos distilling observable softmaps of zipfian prototypes for self-supervised poi | arXiv: 2512.11465
- DOS: Directional Object Separation in Text Embeddings for Multi-Object Image Generation | arXiv: 2510.14376
- dp-geng differentially private dataset distillation guided by dp-generated data | arXiv: 2511.09876
- DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation | arXiv: 2411.16657
- drexperts differential refinement of distortion-aware experts for blind image qu | arXiv: 2602.09531
- drive as you like strategy-level motion planning based on a multi-head diffusion | arXiv: 2508.16947
- driveflow rectified flow adaptation for robust 3d object detection in autonomous | arXiv: 2511.18713
- drivesuprim towards precise trajectory selection for end-to-end planning | arXiv: 2506.06659
- drmd deep reinforcement learning for malware detection under concept drift | arXiv: 2508.18839
- dropouts in confidence moral uncertainty in human-llm alignment | arXiv: 2511.13290
- ds-atgo dual-stage synergistic learning via forward adaptive threshold and backw | arXiv: 2511.13050
- dual-branch spatial-temporal self-supervised representation for enhanced road ne | arXiv: 2511.06633
- dual-path knowledge-augmented contrastive alignment network for spatially resolv | arXiv: 2511.17685
- dualfete revisiting teacher-student interactions from a feedback perspective for | arXiv: 2511.09319
- dualspeechlm towards unified speech understanding and generation via dual speech | arXiv: 2508.08961
- dw-dgat dynamically weighted dual graph attention network for neurodegenerative | arXiv: 2601.10001
- dynamic gaussian scene reconstruction from unsynchronized videos | arXiv: 2511.11175
- DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression | arXiv: 2511.07903
- eagle episodic appearance- and geometry-aware memory for unified 2d-3d visual qu | arXiv: 2511.08007
- earth-adapter bridge the geospatial domain gaps with mixture of frequency adapta | arXiv: 2504.06220
- ease practical and efficient safety alignment for small language models | arXiv: 2511.06512
- easy to learn yet hard to forget towards robust unlearning under bias | arXiv: 2602.21773
- echogen cycle-consistent learning for unified layout-image generation and unders | arXiv: 2603.18001
- echoless label-based pre-computation for memory-efficient heterogeneous graph le | arXiv: 2511.11081
- EcoAgent: An Efficient Device-Cloud Collaborative Multi-Agent Framework for Mobile Automation | arXiv: 2505.05440
- ecpv2 fast efficient and scalable global optimization of lipschitz functions | arXiv: 2511.16575
- eeg-dlite dataset distillation for efficient large eeg model training | arXiv: 2512.12210
- efficient and reliable hitting-set computations for the implicit hitting set app | arXiv: 2508.07015
- efficient chromosome parallelization for precision medicine genomic workflows | arXiv: 2511.15977
- efficient multiagent planning via shared action suggestions | arXiv: 2412.11430
- efficient reasoning for large reasoning language models via certainty-guided ref | arXiv: 2508.05337
- efficient thought space exploration through strategic intervention | arXiv: 2511.10038
- efficientflow efficient equivariant flow policy learning for embodied ai | arXiv: 2512.02020
- efficientfsl enhancing few-shot classification via query-only tuning in vision t | arXiv: 2601.08499
- efx and po allocation exists for two types of goods | arXiv: 2601.03438
- egoems a high-fidelity multimodal egocentric dataset for cognitive assistance in | arXiv: 2511.09894
- elementarynet a non-strategic neural network for predicting human behavior in no | arXiv: 2503.05925
- elspr evaluator llm training data self-purification on non-transitive preference | arXiv: 2505.17691
- EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens | arXiv: 2511.21106
- emergent persuasion will llms persuade without being prompted | arXiv: 2512.22201
- emovid a multimodal emotion video dataset for emotion-centric video understandin | arXiv: 2511.11002
- empowering dino representations for underwater instance segmentation via aligner | arXiv: 2511.08334
- empowering semantic-sensitive underwater image enhancement with vlm | arXiv: 2603.12773
- end-to-end contrastive language-speech pretraining model for long-form spoken qu | arXiv: 2511.09282
- enhancing binary encoded crime linkage analysis using siamese network | arXiv: 2511.07651
- enhancing control policy smoothness by aligning actions with predictions from pr | arXiv: 2601.18479
- enhancing dpsgd via per-sample momentum and low-pass filtering | arXiv: 2511.08841
- enhancing generalization of depth estimation foundation model via weakly-supervi | arXiv: 2511.14238
- enhancing logical expressiveness in graph neural networks via path-neighbor aggr | arXiv: 2511.07994
- enhancing multimodal misinformation detection by replaying the whole story from | arXiv: 2511.06284
- enhancing noise resilience in face clustering via sparse differential transforme | arXiv: 2512.22612
- enhancing robustness of offline reinforcement learning under data corruption via | arXiv: 2511.17568
- enhancing rotation-invariant 3d learning with global pose awareness and attentio | arXiv: 2511.08833
- enhancing uncertainty estimation in llms with expectation of aggregated internal | arXiv: 2509.01564
- epo diverse and realistic protein ensemble generation via energy preference opti | arXiv: 2511.10165
- epsegfz efficient point cloud semantic segmentation for few- and zero-shot scena | arXiv: 2511.11700
- equacode a multi-strategy jailbreak approach for large language models via equat | arXiv: 2512.23173
- error correction in radiology reports a knowledge distillation-based multi-stage | arXiv: 2406.15045
- esg-bench benchmarking long-context esg reports for hallucination mitigation | arXiv: 2603.13154
- evaluating llms for police decision-making a framework based on police action sc | arXiv: 2601.03553
- evaluating synthesizing and enhancing for customer support conversation | arXiv: 2508.04423
- EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer | arXiv: 2509.12718
- Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding | arXiv: 2503.09143
- expandable and differentiable dual memories with orthogonal regularization for e | arXiv: 2511.09871
- experience with single domain generalization in real world medical imaging deplo | arXiv: 2601.16359
- expert-guided prompting and retrieval-augmented generation for emergency medical | arXiv: 2511.10900
- expertad enhancing autonomous driving systems with mixture of experts | arXiv: 2511.11740
- explainable melanoma diagnosis with contrastive learning and llm-based report ge | arXiv: 2512.06105
- explaining decentralized multi-agent reinforcement learning policies | arXiv: 2511.10409
- explanation-preserving augmentation for semi-supervised graph representation lea | arXiv: 2410.12657
- explicit temporal-semantic modeling for dense video captioning via context-aware | arXiv: 2511.10134
- explore and establish synergistic effects between weight pruning and coreset sel | arXiv: 2511.09901
- Explore How to Inject Beneficial Noise in MLLMs | arXiv: 2511.12917
- exploring llms for scientific information extraction using the sciex framework | arXiv: 2512.10004
- exploring surround-view fisheye camera 3d object detection | arXiv: 2511.18695
- Exploring the Effects of Alignment on Numerical Bias in Large Language Models | arXiv: 2601.16444
- exposing deepfakes via hyperspectral domain mapping | arXiv: 2511.11732
- exposing the cracks vulnerabilities of retrieval-augmented llm-based machine tra | arXiv: 2510.00829
- expressive temporal specifications for reward monitoring | arXiv: 2511.12808
- extendattack attacking servers of lrms via extending reasoning | arXiv: 2506.13737
- Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction | arXiv: 2511.13118
- Extreme Value Monte Carlo Tree Search for Classical Planning | arXiv: 2405.18248
- facial-r1 aligning reasoning and recognition for facial emotion analysis | arXiv: 2511.10254
- Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System | arXiv: 2508.06059
- factguard event-centric and commonsense-guided fake news detection | arXiv: 2511.10281
- factorut controlling untrusted ai by monitoring their plans | arXiv: 2512.14745
- failures to surface harmful contents in video large language models | arXiv: 2508.10974
- fair model-based clustering | arXiv: 2602.21509
- fairgse fairness-aware graph neural network without high false positive rates | arXiv: 2511.12132
- fane towards fine-grained cross-modal contrast with false-negative reduction and | arXiv: 2511.12215
- fantasystyle controllable stylized distillation for 3d gaussian splatting | arXiv: 2508.08136
- fast 3d surrogate modeling for data center thermal management | arXiv: 2511.11722
- FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning | arXiv: 2507.23318
- faster certified symmetry breaking using orders with auxiliary variables | arXiv: 2511.16637
- fdp a frequency-decomposition preprocessing pipeline for unsupervised anomaly de | arXiv: 2511.12899
- feature-centric unsupervised node representation learning without homophily assu | arXiv: 2512.15112
- fedalt federated fine-tuning through adaptive local training with rest-of-world | arXiv: 2503.11880
- federated clip for resource-efficient heterogeneous medical image classification | arXiv: 2511.07929
- fedgrpo privately optimizing foundation models with group-relative rewards from | arXiv: 2602.12014
- fedp2eft federated learning to personalize peft for multilingual llms | arXiv: 2502.04387
- fedpm federated learning using second-order optimization with preconditioned mix | arXiv: 2511.09100
- few-shot precise event spotting via unified multi-entity graph and distillation | arXiv: 2511.14186
- fgm-hd boosting generation diversity of fractal generative models through hausdo | arXiv: 2511.08945
- fia-edit frequency-interactive attention for efficient and high-fidelity inversi | arXiv: 2511.12151
- filmweaver weaving consistent multi-shot videos with cache-guided autoregressive | arXiv: 2512.11274
- Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration | arXiv: 2411.17686
- finding diverse solutions parameterized by cliquewidth | arXiv: 2405.20931
- finding the translation switch discovering and exploiting the task-initiation fe | arXiv: 2601.11019
- finding time series anomalies using granular-ball vector data description | arXiv: 2511.12147
- fine-grained dino tuning with dual supervision for face forgery detection | arXiv: 2511.12107
- fine-grained representation for lane topology reasoning | arXiv: 2511.12590
- fine-tuned llms know they dont know a parameter-efficient approach to recovering | arXiv: 2511.12991
- finetec fine-grained action recognition under temporal corruption via skeleton d | arXiv: 2512.25067
- finevau a novel human-aligned benchmark for fine-grained video anomaly understan | arXiv: 2601.17258
- finextrol controllable motion generation via fine-grained text | arXiv: 2511.18927
- finmmdocr benchmarking financial multimodal reasoning with scenario awareness do | arXiv: 2512.24903
- finrpt dataset evaluation system and llm-based multi-agent framework for equity | arXiv: 2511.07322
- first-order error matters accurate compensation for quantized large language mod | arXiv: 2507.11017
- first-order representation languages for goal-conditioned rl | arXiv: 2512.19355
- flashkat understanding and addressing performance bottlenecks in the kolmogorov- | arXiv: 2505.13813
- flexible concept bottleneck model | arXiv: 2511.06678
- flowing backwards improving normalizing flows via reverse representation alignme | arXiv: 2511.22345
- focusing on language revealing and exploiting language attention heads in multil | arXiv: 2511.07498
- forest vs tree the n k trade-off in reproducible ml evaluation | arXiv: 2508.03663
- forget less by learning from parents through hierarchical relationships | arXiv: 2601.01892
- formal abductive latent explanations for prototype-based networks | arXiv: 2511.16588
- formal verification of diffusion auctions | arXiv: 2511.08765
- format as a prior quantifying and analyzing bias in llms for heterogeneous data | arXiv: 2508.15793
- format matters the robustness of multimodal llms in reviewing evidence from tabl | arXiv: 2511.10075
- FoundationSLAM: 释放深度基础模型在端到端稠密视觉SLAM中的潜力 | arXiv: 2512.25008v2
- FourierPET: Deep Fourier-based Unrolled Network for Low-count PET Reconstruction | arXiv: 2601.11680
- FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection | arXiv: 2502.15488
- free-form scene editor enabling multi-round object manipulation like in a 3d eng | arXiv: 2511.13713
- freeinpaint tuning-free prompt alignment and visual rationality enhancement in i | arXiv: 2512.21104
- freqcycle a multi-scale time-frequency analysis method for time series forecasti | arXiv: 2603.09661
- FreqRec: Exploiting Inter-Session Information with Frequency-enhanced Dual-Path Networks for Sequential Recommendation | arXiv: 2511.06285
- from attribution to action jointly aligning predictions and explanations | arXiv: 2511.06944
- from biased chatbots to biased agents examining role assignment effects on llm a | arXiv: 2602.12285
- from classification to ranking enhancing llm reasoning capabilities for mbti per | arXiv: 2601.18582
- from decision trees to boolean logic a fast and unified shap algorithm | arXiv: 2511.09376
- from ids to semantics a generative framework for cross-domain recommendation wit | arXiv: 2511.08006
- from imitation to discrimination toward a generalized curriculum advantage mecha | arXiv: 2512.02580
- From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging | arXiv: 2511.10943
- from passive perception to active memory a weakly supervised image manipulation | arXiv: 2511.20359
- from policy to logic for efficient and interpretable coverage assessment | arXiv: 2601.01266
- from pretrain to pain adversarial vulnerability of video foundation models witho | arXiv: 2511.07049
- from sequential to recursive enhancing decision-focused learning with bidirectio | arXiv: 2511.08035
- from single to societal analyzing persona-induced bias in multi-agent interactio | arXiv: 2511.11789
- from theory of mind to theory of environment counterfactual simulation of latent | arXiv: 2601.01599
- from woofs to words towards intelligent robotic guide dogs with verbal communica | arXiv: 2603.12574
- ft-ncfm an influence-aware data distillation framework for efficient vla models | arXiv: 2511.16233
- funkan functional kolmogorov-arnold network for medical image enhancement and se | arXiv: 2509.13508
- g-ubs towards robust understanding of implicit feedback via group-aware user beh | arXiv: 2508.05709
- g2lfrom giga-scale to cancer-specific large-scale pathology foundation models vi | arXiv: 2510.11176
- gaico a deployed and extensible framework for evaluating diverse and multimodal | arXiv: 2508.16753
- gaming the answer matcher examining the impact of text manipulation on automated | arXiv: 2601.08849
- gatera token-aware modulation for parameter-efficient fine-tuning | arXiv: 2511.17582
- gaussian blending rethinking alpha blending in 3d gaussian splatting | arXiv: 2511.15102
- gaussianimage boosted image representation and compression with 2d gaussian spla | arXiv: 2512.19108
- gazeinterpreter parsing eye gaze to generate eye-body-coordinated narrations | arXiv: 2511.16245
- gcl-ot graph contrastive learning with optimal transport for heterophilic text-a | arXiv: 2511.16778
- gdba revisited unleashing the power of guided local search for distributed const | arXiv: 2508.06899
- gem generative entropy-guided preference modeling for few-shot alignment of llms | arXiv: 2511.13007
- gender bias in emotion recognition by large language models | arXiv: 2511.19785
- gene incremental learning for single-cell transcriptomics | arXiv: 2511.13762
- genepheno interpretable gene knockout-induced phenotype abnormality prediction f | arXiv: 2511.09512
- generalising traffic forecasting to regions without traffic observations | arXiv: 2508.08947
- generalizable slum detection from satellite imagery with mixture-of-experts | arXiv: 2511.10300
- generalization bounds for semi-supervised matrix completion with distributional | arXiv: 2511.13049
- generalized geometry encoding volume for real-time stereo matching | arXiv: 2512.06793
- generalizing analogical inference from boolean to continuous domains | arXiv: 2511.10416
- generalizing fair clustering to multiple groups algorithms and applications | arXiv: 2511.11539
- generating attribute-aware human motions from textual prompt | arXiv: 2506.21912
- genvidbench a 6-million benchmark for ai-generated video detection | arXiv: 2501.11340
- geometry meets light leveraging geometric priors for universal photometric stere | arXiv: 2511.13015
- gewdiff geometric enhanced wavelet-based diffusion model for hyperspectral image | arXiv: 2511.07103
- ghost in the transformer detecting model reuse with invariant spectral signature | arXiv: 2511.06390
- ghost solving the traveling salesman problem on graphs of convex sets | arXiv: 2511.06471
- giim graph-based learning of inter- and intra-view dependencies for multi-view m | arXiv: 2603.09446
- Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models | arXiv: 2501.05179
- global-lens transformers adaptive token mixing for dynamic link prediction | arXiv: 2511.12442
- gloctm cross-lingual topic modeling via a global context space | arXiv: 2601.11872
- GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery | arXiv: 2602.19872
- gompsnr reflourish the signal-to-noise ratio metric for audio generation tasks | arXiv: 2601.13758
- good-for-mdp state reduction for stochastic ltl planning | arXiv: 2511.09073
- gp-molformer-sim test time molecular optimization through contextual similarity | arXiv: 2506.05628
- gram-r2 self-training generative foundation reward models for reward reasoning | arXiv: 2509.02492
- granalign granularity-aware alignment framework for zero-shot video moment retri | arXiv: 2601.00584
- graph of verification structured verification of llm reasoning with directed acy | arXiv: 2506.12509
- graph out-of-distribution detection via test-time calibration with dual dynamic | arXiv: 2511.13541
- graph smoothing for enhanced local geometry learning in point cloud analysis | arXiv: 2601.11102
- Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting | arXiv: 2603.06663
- graph-theoretic consistency for robust and topology-aware semi-supervised histop | arXiv: 2509.22689
- graphtextack a realistic black-box node injection attack on llm-enhanced gnns | arXiv: 2511.12423
- griffin aerial-ground cooperative detection and tracking dataset and benchmark | arXiv: 2503.06983
- grim task-oriented grasping with conditioning on generative examples | arXiv: 2506.15607
- ground what you see hallucination-resistant mllms via caption feedback diversity | arXiv: 2601.06224
- group orthogonal low-rank adaptation for rgb-t tracking | arXiv: 2512.05359
- grover graph-guided representation of omics and vision with expert regulation fo | arXiv: 2511.11730
- gsap-ere fine-grained scholarly entity and relation extraction focused on machin | arXiv: 2511.09411
- gt-snt a linear-time transformer for large-scale graphs via spiking node tokeniz | arXiv: 2504.11840
- gt2-gs geometry-aware texture transfer for gaussian splatting | arXiv: 2505.15208
- Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs | arXiv: 2508.02573
- guided perturbation sensitivity gps detecting adversarial text via embedding sta | arXiv: 2508.11667
- guidegen a text-guided framework for paired full-torso anatomy and ct volume gen | arXiv: 2403.07247
- guideline-consistent segmentation via multi-agent refinement | arXiv: 2509.04687
- h-gar a hierarchical interaction framework via goal-driven observation-action re | arXiv: 2511.17079
- HACK: Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling | arXiv: 2504.09261
- Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models | arXiv: 2511.17170
- hallucination stations on some basic limitations of transformer-based language m | arXiv: 2507.07505
- hard vs noise resolving hard-noisy sample confusion in recommender systems via l | arXiv: 2511.07295
- harmonic dataset distillation for time series forecasting | arXiv: 2603.03760
- harnessing textual semantic priors for knowledge transfer and refinement in clip | arXiv: 2508.01579
- Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models | arXiv: 2504.08202
- harnessing vision-language models for time series anomaly detection | arXiv: 2506.06836
- hashed watermark as a filter defeating forging and overwriting attacks in weight | arXiv: 2507.11137
- hcf hierarchical cascade framework for distributed multi-stage image compression | arXiv: 2508.02051
- hcpo hierarchical conductor-based policy optimization in multi-agent reinforceme | arXiv: 2511.12123
- hd2-ssc high-dimension high-density semantic scene completion for autonomous dri | arXiv: 2511.07925
- HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection | arXiv: 2512.17601v2
- healsplit towards self-healing through adversarial distillation in split federat | arXiv: 2511.11240
- hearing more with less multi-modal retrieval-and-selection augmented conversatio | arXiv: 2508.01166
- heterogeneous uncertainty-guided composed image retrieval with fine-grained prob | arXiv: 2601.11393
- hierarchical direction perception via atomic dot-product operators for rotation- | arXiv: 2511.08240
- hierarchical pedagogical oversight a multi-agent adversarial framework for relia | arXiv: 2512.22496
- hierarchical prompt learning for image- and text-based person re-identification | arXiv: 2511.13575
- hierarchical schedule optimization for fast and robust diffusion model sampling | arXiv: 2511.11688
- hierarchicalprune position-aware compression for large-scale diffusion models | arXiv: 2508.04663
- hifusion hierarchical intra-spot alignment and regional context fusion for spati | arXiv: 2511.12969
- higher-order responsibility | arXiv: 2506.01003
- hilomix robust high- and low-frequency graph learning framework for mixing addre | arXiv: 2511.07759
- HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment | arXiv: 2511.06653
- History-Aware Reasoning for GUI Agents | arXiv: 2511.09127
- hn-mvts hypernetwork-based multivariate time series forecasting | arXiv: 2511.08340
- how bias binds measuring hidden associations for bias control in text-to-image c | arXiv: 2511.07091
- How Does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective | arXiv: 2505.21505
- how hard is it to explain preferences using few boolean attributes | arXiv: 2511.13445
- how hard is it to rig a tournament when few players can beat or be beaten by the | arXiv: 2601.08530
- how many experts are enough towards optimal semantic specialization for mixture- | arXiv: 2512.19765
- how to marginalize in causal structure learning | arXiv: 2511.14001
- how wide and how deep mitigating over-squashing of gnns via channel capacity con | arXiv: 2511.06443
- hpsu a benchmark for human-level perception in real-world spoken speech understa | arXiv: 2511.23178
- hq-svc towards high-quality zero-shot singing voice conversion in low-resource s | arXiv: 2511.08496
- hskbenchmark modeling and benchmarking chinese second language acquisition in la | arXiv: 2511.15574
- human cognition inspired rag with knowledge graph for complex problem solving | arXiv: 2503.06567
- human cognitive biases in explanation-based interaction the case of within and b | arXiv: 2512.04764
- human-centric open-future task discovery formulation benchmark and scalable tree | arXiv: 2511.18929
- human-in-the-loop interactive report generation for chronic disease adherence | arXiv: 2601.06364
- hybrid-dmkg a hybrid reasoning framework over dynamic multimodal knowledge graph | arXiv: 2512.00881
- hybridla hybrid generation for document layout analysis | arXiv: 2511.19919
- hydrodcm hydrological domain-conditioned modulation for cross-reservoir inflow p | arXiv: 2512.03300
- hymoerec hybrid mixture-of-experts for sequential recommendation | arXiv: 2511.06388
- hyperbolic continuous structural entropy for hierarchical clustering | arXiv: 2512.00524
- hyperbolic hierarchical alignment reasoning network for text-3d retrieval | arXiv: 2511.11045
- hypershap shapley values and interactions for explaining hyperparameter optimiza | arXiv: 2502.01276
- hypothesis generation via llm-automated language bias for ilp | arXiv: 2505.21486
- i-cam-uv integrating causal graphs over non-identical variable sets using causal | arXiv: 2603.03207
- i-inr iterative implicit neural representations | arXiv: 2504.17364
- i2e real-time image-to-event conversion for high-performance spiking neural netw | arXiv: 2511.08065
- icl-router in-context learned model representations for llm routing | arXiv: 2510.09719
- ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement | arXiv: 2511.13607
- idealtsf can non-ideal data contribute to enhancing the performance of time seri | arXiv: 2512.05442
- identifying and analyzing performance-critical tokens in large language models | arXiv: 2401.11323
- ie-srgs an internal-external knowledge fusion framework for high-fidelity 3d gau | arXiv: 2511.22233
- iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference | arXiv: 2511.11306
- imagebinddc compressing multi-modal data with imagebind-based condensation | arXiv: 2511.08263
- importance-aware data selection for efficient llm instruction tuning | arXiv: 2511.07074
- improved differentially private algorithms for rank aggregation | arXiv: 2511.11319
- improved masked image generation with knowledge-augmented token representations | arXiv: 2511.12032
- improved runtime guarantees for the spea2 multi-objective optimizer | arXiv: 2511.07150
- improving multimodal sentiment analysis via modality optimization and dynamic pr | arXiv: 2511.06328
- improving region representation learning from urban imagery with noisy long-capt | arXiv: 2511.07062
- improving sparse imu-based motion capture with motion label smoothing | arXiv: 2511.22288
- improving sustainability of adversarial examples in class-incremental learning | arXiv: 2511.09088
- improving the convergence rate of ray search optimization for query-efficient ha | arXiv: 2512.21241
- Improving Value-based Process Verifier via Low-Cost Variance Reduction | arXiv: 2508.10539
- in-token rationality optimization towards accurate and concise llm reasoning via | arXiv: 2511.09865
- Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement | arXiv: 2511.16331
- incremental maintenance of datalogmtl materialisations | arXiv: 2511.12169
- induce align predict zero-shot stance detection via cognitive inductive reasonin | arXiv: 2506.13470
- inductive generative recommendation via retrieval-based speculation | arXiv: 2410.02939
- InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration | arXiv: 2512.02981
- Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models | arXiv: 2508.10030
- infigui-g1 advancing gui grounding with adaptive exploration policy optimization | arXiv: 2508.05731
- Infinite-Story: A Training-Free Consistent Text-to-Image Generation | arXiv: 2511.13002
- InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer | arXiv: 2511.15967
- infocom kilobyte-scale communication-efficient collaborative perception with inf | arXiv: 2512.10305
- infodecom decomposing information for defending against privacy leakage in split | arXiv: 2511.13365
- information theoretic optimal surveillance for epidemic prevalence in networks | arXiv: 2601.04267
- instance generation for meta-black-box optimization through latent space reverse | arXiv: 2509.15810
- intention chain-of-thought prompting with dynamic routing for code generation | arXiv: 2512.14048
- intention-guided cognitive reasoning for egocentric long-term action anticipatio | arXiv: 2508.01742
- intermediate n-gramming deterministic and fast n-grams for large n and large dat | arXiv: 2511.14955
- intermoe individual-specific 3d human interaction generation via dynamic tempora | arXiv: 2511.13488
- interpretable reward model via sparse autoencoder | arXiv: 2508.08746
- interpreting fedspeak with confidence a llm-based uncertainty-aware framework gu | arXiv: 2508.08001
- intervention efficiency and perturbation validation framework capacity-aware and | arXiv: 2511.14317
- intrinsic barriers and practical pathways for human-ai alignment an agreement-ba | arXiv: 2502.05934
- investigating data pruning for pretraining biological foundation models at scale | arXiv: 2512.12932
- invisible triggers visible threats road-style adversarial creation attack for vi | arXiv: 2511.08015
- irote human-like traits elicitation of large language model via in-context self- | arXiv: 2508.08719
- is the information bottleneck robust enough towards label-noise resistant inform | arXiv: 2512.10573
- ISEAL: Encrypted Fingerprinting for Reliable LLM Ownership Verification | arXiv: 2511.08905
- jodiffusion jointly diffusing image with pixel-level annotations for semantic se | arXiv: 2512.13014
- Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction | arXiv: 2509.10798
- judging by the rules compliance-aligned framework for modern slavery statement m | arXiv: 2511.07803
- Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search | arXiv: 2509.09245
- just few states are enough randomized sparse feedback for stability of dynamical | arXiv: 2511.13870
- kernelized edge attention addressing semantic attention blurring in temporal gra | arXiv: 2602.00596
- kinest a kinematics-guided spatiotemporal state space model for human motion tra | arXiv: 2512.16791
- know your trajectory -- trustworthy reinforcement learning deployment through im | arXiv: 2512.06917
- knowledge completes the vision a multimodal entity-aware retrieval-augmented gen | arXiv: 2511.21002
- knowledge-guided masked autoencoder with linear spectral mixing and spectral-ang | arXiv: 2512.12445
- ktcf actionable recourse in knowledge tracing via counterfactual explanations fo | arXiv: 2601.09156
- KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache | arXiv: 2506.08018
- L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention | arXiv: 2511.17910
- laf-grpo in-situ navigation instruction generation for the visually impaired via | arXiv: 2506.04070
- lamp learning universal adversarial perturbations for multi-image tasks via pre- | arXiv: 2601.21220
- lampq towards accurate layer-wise mixed precision quantization for vision transf | arXiv: 2511.10004
- language model distillation a temporal difference imitation learning perspective | arXiv: 2505.20335
- Language Models and Logic Programs for Trustworthy Tax Reasoning | arXiv: 2508.21051
- large language models meet extreme multi-label classification scaling and multi- | arXiv: 2511.13189
- Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers | arXiv: 2511.07934
- leanrag knowledge-graph-based generation with semantic aggregation and hierarchi | arXiv: 2508.10391
- Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling | arXiv: 2511.21120v1
- learning compact latent space for representing neural signed distance functions | arXiv: 2511.14539
- learning conjugate direction fields for planar quadrilateral mesh generation | arXiv: 2511.11865
- learning fair representations with kolmogorov-arnold networks | arXiv: 2511.11767
- learning from the undesirable robust adaptation of language models without forge | arXiv: 2511.13052
- learning network dismantling without handcrafted inputs | arXiv: 2508.00706
- learning procedural-aware video representations through state-grounded hierarchy | arXiv: 2511.20073
- learning spatial decay for vision transformers | arXiv: 2508.09525
- learning subgroups with maximum treatment effects without causal heuristics | arXiv: 2511.20189
- learning time in static classifiers | arXiv: 2511.12321
- learning to collaborate an orchestrated-decentralized framework for peer-to-peer | arXiv: 2601.17133
- learning to generate and extract a multi-agent collaboration framework for zero- | arXiv: 2603.02909
- learning to tell apart weakly supervised video anomaly detection via disentangle | arXiv: 2511.10334
- learning topology-driven multi-subspace fusion for grassmannian deep network | arXiv: 2511.08628
- learning with preserving for continual multitask learning | arXiv: 2511.11676
- length-adaptive interest network for balancing long and short sequence modeling | arXiv: 2601.19142
- Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition | arXiv: 2512.17946
- let the void be void robust open-set semi-supervised learning via selective non- | arXiv: 2504.12569
- leveraging textual compositional reasoning for robust change captioning | arXiv: 2511.22903
- lexchronos an agentic framework for structured event timeline extraction in indi | arXiv: 2603.01651
- lidar-gsimproving lidar gaussian reconstruction via diffusion priors | arXiv: 2511.12304
- lidarcrafter dynamic 4d world modeling from lidar sequences | arXiv: 2508.03692
- liecraft a multi-agent framework for evaluating deceptive capabilities in langua | arXiv: 2603.06874
- life machine learning and the search for habitability predicting biosignature fl | arXiv: 2601.12557
- lifelong domain adaptive 3d human pose estimation | arXiv: 2512.23860
- lightweight optimal-transport harmonization on edge devices | arXiv: 2511.12785
- lilad learning in-context lyapunov-stable adaptive dynamics models | arXiv: 2511.21846
- linext revisiting lidar completion with efficient non-diffusion architectures | arXiv: 2511.10209
- listen like a teacher mitigating whisper hallucinations using adaptive layer att | arXiv: 2511.14219
- listening between the frames bridging temporal gaps in large audio-language mode | arXiv: 2511.11039
- livibench an omnimodal benchmark for interactive livestream video understanding | arXiv: 2601.15016
- llandmark a multi-agent framework for landmark-aware multimodal interactive vide | arXiv: 2603.02888
- llm targeted underperformance disproportionately impacts vulnerable users | arXiv: 2406.17737
- llm-as-a-judge for scalable test coverage evaluation accuracy operational reliab | arXiv: 2512.01232
- LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction | arXiv: 2512.18623
- llmc benchmarking vision-language model compression with a plug-and-play toolkit | arXiv: 2508.09981
- llms for game theory entropy-guided in-context learning and adaptive cot reasoni | arXiv: 2601.10775
- llmtm benchmarking and optimizing llms for temporal motif analysis in dynamic gr | arXiv: 2512.22266
- local guidance for configuration-based multi-agent pathfinding | arXiv: 2510.19072
- logical characterizations of gnns with mean aggregation | arXiv: 2507.18145
- loki low-damage knowledge implanting of large language models | arXiv: 2505.22120
- longllada unlocking long context capabilities in diffusion llms | arXiv: 2506.14429
- longt2ibench a benchmark for evaluating long text-to-image generation with graph | arXiv: 2512.09271
- loom personalized learning informed by daily llm conversations toward long-term | arXiv: 2511.21037
- loopllm transferable energy-latency attacks in llms via repetitive generation | arXiv: 2511.07876
- loretta a low resource framework to poison continuous time dynamic graphs | arXiv: 2511.07379
- loss-guided auxiliary agents for overcoming mode collapse in gflownets | arXiv: 2505.15251
- lost in benchmarks rethinking large language model benchmarking with item respon | arXiv: 2505.15055
- lost in time a meta-learning framework for time-shift-tolerant physiological sig | arXiv: 2511.21500
- lost in translation a comparative study on the cross-lingual transfer of composi | arXiv: 2602.07963
- low-rank curvature for zeroth-order optimization in llm fine-tuning | arXiv: 2511.07971
- lucid learning-enabled uncertainty-aware certification of stochastic dynamical s | arXiv: 2512.11750
- lungnoduleagent a collaborative multi-agent system for precision diagnosis of lu | arXiv: 2511.21042
- lwganet addressing spatial and channel redundancy in remote sensing visual tasks | arXiv: 2501.10040
- m2fmoe multi-resolution multi-view frequency mixture-of-experts for extreme-adap | arXiv: 2601.08631
- m3sr multi-scale multi-perceptual mamba for efficient spectral reconstruction | arXiv: 2601.08293
- Machine Learning for Sustainable Rice Production: Region-Scale Monitoring of Water-Saving Practices in Punjab, India | arXiv: 2507.08605
- macprompt maraconic-guided jailbreak against text-to-image models | arXiv: 2601.07141
- macs multi-source audio-to-image generation with contextual significance and sem | arXiv: 2503.10287
- macvqa adaptive memory allocation and global noise filtering for continual visua | arXiv: 2601.01926
- Magnitude Matters: A Superior Class of Similarity Metrics for Holistic Semantic Understanding | arXiv: 2509.19323
- magnitude-modulated equivariant adapter for parameter-efficient fine-tuning of e | arXiv: 2511.06696
- maisi-v2 accelerated 3d high-resolution medical image synthesis with rectified f | arXiv: 2508.05772
- mama-memeia multi-aspect multi-agent collaboration for depressive symptoms ident | arXiv: 2512.25015
- MambaMia: State-Space Hierarchical Compression for Hour-Long Video Understanding in Large Multimodal Models | arXiv: 2506.13564
- MambaSeg: Harnessing Mamba for Accurate and Efficient Image-Event Semantic Segmentation | arXiv: 2512.24243
- manilong-shot interaction-aware one-shot imitation learning for long-horizon man | arXiv: 2512.16302
- mapi-gnn multi-activation plane interaction graph neural network for multimodal | arXiv: 2512.20026
- MAPS: Multi-Agent Personality Shaping for Collaborative Reasoning | arXiv: 2503.16905
- margin-aware preference optimization for aligning diffusion models without refer | arXiv: 2406.06424
- mars a meta-adaptive reinforcement learning framework for risk-aware multi-agent | arXiv: 2508.01173
- MARS: Multi-Agent Adaptive Reasoning with Socratic Guidance for Automated Prompt Optimization | arXiv: 2503.16874
- mask the redundancy evolving masking representation learning for multivariate ti | arXiv: 2511.17008
- mask2iv interaction-centric video generation via mask trajectories | arXiv: 2510.03135
- mass concept erasure in diffusion models with concept hierarchy | arXiv: 2601.03305
- mathsmith towards extremely hard mathematical reasoning by forging synthetic pro | arXiv: 2508.05592
- matrix-free two-to-infinity and one-to-two norms estimation | arXiv: 2508.04444
- mavis a benchmark for multimodal source attribution in long-form visual question | arXiv: 2511.12142
- mcmoe completing missing modalities with mixture of experts for incomplete multi | arXiv: 2511.17397
- mcts-sql light-weight llms can master the text-to-sql through monte carlo tree s | arXiv: 2501.16607
- mctsr-zero self-reflective psychological counseling dialogues generation via pri | arXiv: 2505.23229
- mdaif robust one-stop multi-degradation-aware image fusion with language-driven | arXiv: 2511.12525
- mdiff4str mask diffusion model for scene text recognition | arXiv: 2512.01422
- measuring model performance in the presence of an intervention | arXiv: 2511.05805
- measuring stability beyond accuracy in small open-source medical large language | arXiv: 2601.11567
- medeyes learning dynamic visual focus for medical progressive diagnosis | arXiv: 2511.22018
- MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models | arXiv: 2509.23725
- melodia training-free music editing guided by attention probing in diffusion mod | arXiv: 2511.08252
- mem-pal towards memory-based personalized dialogue assistants for long-term user | arXiv: 2511.13410
- mergedna context-aware genome modeling with dynamic tokenization through token m | arXiv: 2511.14806
- MeSHA: Efficient Path Planning with Motion Primitives | arXiv: 2412.10320
- meshsplat generalizable sparse-view surface reconstruction via gaussian splattin | arXiv: 2508.17811
- meta dynamic graph for traffic flow prediction | arXiv: 2601.10328
- metagdpo alleviating catastrophic forgetting with metacognitive knowledge throug | arXiv: 2511.12113
- mf-speech achieving fine-grained and compositional control in speech generation | arXiv: 2511.12074
- mfmamba a multi-function network for panchromatic image resolution restoration b | arXiv: 2511.18888
- microevoeval a systematic evaluation framework for image-based microstructure ev | arXiv: 2511.08955
- midb multilingual instruction data booster for enhancing cultural equality in mu | arXiv: 2505.17671
- mindcross fast new subject adaptation with limited data for cross-subject video | arXiv: 2511.14196
- mindvote when ai meets the wild west of social media opinion | arXiv: 2505.14422
- minimizing inequity in facility location games | arXiv: 2602.01048
- minimum-cost network flow with dual predictions | arXiv: 2601.20203
- mirage scaling test-time inference with parallel graph-retrieval-augmented reaso | arXiv: 2508.18260
- mirnet integrating constrained graph-based reasoning with pre-training for diagn | arXiv: 2511.10013
- mitigating content effects on reasoning in language models through fine-grained | arXiv: 2505.12189
- mitigating error accumulation in co-speech motion generation via global rotation | arXiv: 2511.10076
- mixture of ranks with degradation-aware routing for one-step real-world image su | arXiv: 2511.16024
- MMhops-R1: Multimodal Multi-hop Reasoning | arXiv: 2512.13573
- mmpred radar-based human motion prediction in the dark | arXiv: 2512.00345
- moba a material-oriented backdoor attack against lidar-based 3d object detection | arXiv: 2511.09999
- mobgs motion deblurring dynamic 3d gaussian splatting for blurry monocular video | arXiv: 2504.15122
- modality-aware bias mitigation and invariance learning for unsupervised visible- | arXiv: 2512.07760
- model change for description logic concepts | arXiv: 2603.05562
- model counting for dependency quantified boolean formulas | arXiv: 2511.07337
- model editing as a double-edged sword steering agent ethical behavior toward ben | arXiv: 2506.20606
- modelling the effects of hearing loss on neural coding in the auditory midbrain | arXiv: 2506.03088
- moetta test-time adaptation under mixed distribution shifts with moe-layernorm | arXiv: 2511.13760
- mofu scale-aware modulation and fourier fusion for multi-subject video generatio | arXiv: 2512.22310
- monoclue object-aware clustering enhances monocular 3d object detection | arXiv: 2511.07862
- moral change or noise on problems of aligning ai with temporally unstable human | arXiv: 2511.10032
- moralreason generalizable moral decision alignment for llm agents using reasonin | arXiv: 2511.12271
- more than irrational modeling belief-biased agents | arXiv: 2511.12359
- mose hierarchical self-distillation enhances early layer embeddings | arXiv: 2503.03008
- motif multi-strategy optimization via turn-based interactive framework | arXiv: 2508.03929
- motioncharacter fine-grained motion controllable human video generation | arXiv: 2411.18281
- motorec sparse-regularized multimodal tokenization for cold-start recommendation | arXiv: 2602.11062
- movsemcl movement-semantics contrastive learning for trajectory similarity exten | arXiv: 2511.12061
- mp1 meanflow tames policy learning in 1-step for robotic manipulation | arXiv: 2507.10543
- mpa multimodal prototype augmentation for few-shot learning | arXiv: 2602.10143
- mpd-sgr robust spiking neural networks with membrane potential distribution-driv | arXiv: 2511.12199
- mr-cosmo visual-text memory recall and direct cross-modal alignment method for q | arXiv: 2506.20991
- mug meta-path-aware universal heterogeneous graph pre-training | arXiv: 2602.22645
- MUG: Multi-agent Undercover Gaming — Hallucination Removal via Counterfactual Test for Multimodal Reasoning | arXiv: 2511.11182
- multi-agent vlms guided self-training with pnu loss for low-resource offensive c | arXiv: 2511.13759
- multi-aspect cross-modal quantization for generative recommendation | arXiv: 2511.15122
- multi-faceted attack exposing cross-model vulnerabilities in defense-equipped vi | arXiv: 2511.16110
- multi-granularity interactive attention framework for residual hierarchical pron | arXiv: 2601.01745
- multi-metric preference alignment for generative speech restoration | arXiv: 2508.17229
- multi-modal assistance for unsupervised domain adaptation on point cloud 3d obje | arXiv: 2511.07966
- multi-modal dynamic proxy learning for personalized multiple clustering | arXiv: 2511.07274
- multigranular evaluation for brain visual decoding | arXiv: 2507.07993
- multimodal data fusion to capture dynamic interactions between built environment | arXiv: 2601.11545
- Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework | arXiv: 2506.02454
- multiplicative orthogonal sequential editing for language models | arXiv: 2601.07873
- multitab a scalable foundation for multitask learning on tabular data | arXiv: 2511.09970
- multivariate gaussian representation learning for medical action evaluation | arXiv: 2511.10060
- mvgd-net a novel motion-aware video glass surface detection network | arXiv: 2601.13715
- mygram modality-aware graph transformer with global distribution for multi-modal | arXiv: 2601.11885
- n2n-gqa noise-to-narrative for graph-based table-text question answering using l | arXiv: 2601.06603
- nadir differential attention flow for non-autoregressive transliteration in indi | arXiv: 2601.12389
- neighbor-aware instance refining with noisy labels for cross-modal retrieval | arXiv: 2512.24064
- nestr a neuro-symbolic abductive framework for temporal reasoning in large langu | arXiv: 2512.07218
- neural bandit based optimal llm selection for a pipeline of tasks | arXiv: 2508.09958
- Neural Graph Navigation for Intelligent Subgraph Matching | arXiv: 2511.17939
- neurobridge bio-inspired self-supervised eeg-to-image decoding via cognitive pri | arXiv: 2511.06836
- new synthetic goldmine hand joint angle-driven emg data generation framework for | arXiv: 2509.23359
- no-regret strategy solving in imperfect-information games via pre-trained embedd | arXiv: 2511.12083
- notam-evolve a knowledge-guided self-evolving optimization framework with llms f | arXiv: 2511.07982
- note2chat improving llms for multi-turn clinical history taking using medical no | arXiv: 2601.21551
- ntsformer a self-teaching graph transformer for multimodal isolated cold-start n | arXiv: 2507.04870
- nurbgen high-fidelity text-to-cad generation through llm-driven nurbs modeling | arXiv: 2511.06194
- nutriscreener retrieval-augmented multi-pose graph attention network for malnour | arXiv: 2511.16566
- o3slm open weight open data and open vocabulary sketch-language model | arXiv: 2511.14368
- oad-promoter enhancing zero-shot vqa using large language models with object att | arXiv: 2511.12131
- object-centric latent action learning | arXiv: 2502.09680
- object-centric world models for causality-aware reinforcement learning | arXiv: 2511.14262
- oceansplat object-aware gaussian splatting with trinocular view consistency for | arXiv: 2601.04984
- oida-qa a multimodal benchmark for analyzing the opioid industry documents archi | arXiv: 2511.09914
- omnipt unleashing the potential of large vision language models for pedestrian t | arXiv: 2511.17053
- omnivdiff omni controllable video diffusion for generation and understanding | arXiv: 2504.10825
- on stealing graph neural network models | arXiv: 2511.07170
- on the edge of core non-emptiness an automated reasoning approach to approval-ba | arXiv: 2512.16895
- on the exponential convergence for offline rlhf with pairwise comparisons | arXiv: 2406.12205
- on the information processing of one-dimensional wasserstein distances with fini | arXiv: 2511.12881
- On the Learning Dynamics of Two-Layer Linear Networks with Label Noise SGD | arXiv: 2603.10397
- on the variability of concept activation vectors | arXiv: 2509.24058
- one-step generative policies with q-learning a reformulation of meanflow | arXiv: 2511.13035
- online linear regression with paid stochastic features | arXiv: 2511.08073
- open-world 3d scene graph generation for retrieval-augmented reasoning | arXiv: 2511.05894
- open-world object counting in videos | arXiv: 2506.15368
- openscan a benchmark for generalized open-vocabulary 3d scene understanding | arXiv: 2408.11030
- opera a reinforcement learning--enhanced orchestrated planner-executor architect | arXiv: 2508.16438
- opt3dgs optimizing 3d gaussian splatting with adaptive exploration and curvature | arXiv: 2511.13571
- optimal look-back horizon for time series forecasting in federated learning | arXiv: 2511.12791
- optimal welfare in noncooperative network formation under attack | arXiv: 2511.10845
- optimized algorithms for text clustering with llm-generated constraints | arXiv: 2601.11118
- optscale probabilistic optimality for inference-time scaling | arXiv: 2506.22376
- or-r1 automating modeling and solving of operations research optimization proble | arXiv: 2511.09092
- orvit near-optimal online distributionally robust reinforcement learning | arXiv: 2508.03768
- otter mitigating background distractions of wide-angle few-shot action recogniti | arXiv: 2511.06741
- pa-fas towards interpretable and generalizable multimodal face anti-spoofing via | arXiv: 2511.17927
- padiff predictive and adaptive diffusion policies for ad hoc teamwork | arXiv: 2511.07260
- pairing-free group-level knowledge distillation for robust gastrointestinal lesi | arXiv: 2601.09209
- panda -- patch and distribution-aware augmentation for long-tailed exemplar-free | arXiv: 2511.09791
- panda test-time adaptation with negative data augmentation | arXiv: 2511.10481
- panfoma a lightweight foundation model and benchmark for pan-cancer | arXiv: 2512.03111
- panonav mapless zero-shot object navigation with panoramic scene parsing and dyn | arXiv: 2511.06840
- parallelism meets adaptiveness scalable documents understanding in multi-agent l | arXiv: 2507.17061
- parameta towards learning disentangled paralinguistic speaking styles representa | arXiv: 2601.12289
- parameter-free fine-tuning via redundancy elimination for vision foundation mode | arXiv: 2504.08915
- parameterized approximation algorithms for tsp on non-metric graphs | arXiv: 2503.03642
- parametric pareto set learning for expensive multi-objective optimization | arXiv: 2511.05815
- parametrized multi-agent routing via deep attention models | arXiv: 2507.22338
- pararevsnn a parallel reversible spiking neural network for efficient training a | arXiv: 2508.01223
- pareto-grid-guided large language models for fast and high-quality heuristics de | arXiv: 2507.20923
- paretohqd fast offline multiobjective alignment of large language models using p | arXiv: 2504.16628
- partial action replacement tackling distribution shift in offline marl | arXiv: 2511.07629
- partially shared concept bottleneck models | arXiv: 2511.22170
- pase leveraging the phonological prior of wavlm for low-hallucination generative | arXiv: 2511.13300
- pase prototype-aligned calibration and shapley-based equilibrium for multimodal | arXiv: 2511.17585
- pathmind a retrieve-prioritize-reason framework for knowledge graph reasoning wi | arXiv: 2511.14256
- patientvlm meets docvlm pre-consultation dialogue between vision-language models | arXiv: 2601.10945
- pb4u-gnet resolution-adaptive garment simulation via propagation-before-update g | arXiv: 2601.15110
- pcokg personality-aware commonsense reasoning with debate | arXiv: 2601.06234
- peoat personalization-guided evolutionary question assembly for one-shot adaptiv | arXiv: 2512.00439
- perceive act and correct confidence is not enough for hyperspectral classificati | arXiv: 2511.10068
- persistent instability in llms personality measurements effects of scale reasoni | arXiv: 2508.04826
- personality-guided public-private domain disentangled hypergraph-former network | arXiv: 2511.12460
- personalization of large foundation models for health interventions | arXiv: 2601.03482
- personalized federated learning with bidirectional communication compression via | arXiv: 2511.13144
- perspective from a broader context can room style knowledge help visual floorpla | arXiv: 2508.01216
- pertouch vlm-driven agent for personalized and semantic image retouching | arXiv: 2511.12998
- perturb your data paraphrase-guided training data watermarking | arXiv: 2512.17075
- perturbing best responses in zero-sum games | arXiv: 2511.12523
- pet2rep towards vision-language model-drived automated radiology report generati | arXiv: 2508.04062
- pfavatar pose-fusion 3d personalized avatar reconstruction from real-world outfi | arXiv: 2511.12935
- phantom menace exploring and enhancing the robustness of vla models against phys | arXiv: 2511.10008
- pharos-esg a framework for multimodal parsing contextual narration and hierarchi | arXiv: 2511.16417
- phased one-step adversarial equilibrium for video diffusion models | arXiv: 2508.21019
- phys-liquid a physics-informed dataset for estimating 3d geometry and volume of | arXiv: 2511.11077
- physics-informed autonomous llm agents for explainable power electronics modulat | arXiv: 2411.14214
- physics-informed deformable gaussian splatting towards unified constitutive laws | arXiv: 2511.06299
- physicscorrect a training-free approach for stable neural pde simulations | arXiv: 2507.02227
- pimrl physics-informed multi-scale recurrent learning for burst-sampled spatiote | arXiv: 2503.10253
- pings-x physics-informed normalized gaussian splatting with axes alignment for e | arXiv: 2511.11048
- piphen physical interaction prediction with hamiltonian energy networks | arXiv: 2511.16200
- planttraitnet an uncertainty-aware multimodal framework for global-scale plant t | arXiv: 2511.06943
- playmate2 training-free multi-character audio-driven animation via diffusion tra | arXiv: 2510.12089
- plug-and-play clarifier a zero-shot multimodal framework for egocentric intent d | arXiv: 2511.08971
- plug-and-play parameter-efficient tuning of embeddings for federated recommendat | arXiv: 2512.13734
- plugtrack multi-perceptive motion analysis for adaptive fusion in multi-object t | arXiv: 2511.13105
- pocketllm ultimate compression of large language models via meta networks | arXiv: 2511.17637
- point cloud quantization through multimodal prompting for 3d understanding | arXiv: 2511.12079
- point-sra self-representation alignment for 3d representation learning | arXiv: 2601.01746
- position on llm-assisted peer review addressing reviewer gap through mentoring a | arXiv: 2601.09182
- positional bias in multimodal embedding models do they favor the beginning the m | arXiv: 2511.11216
- post training quantization for efficient dataset condensation | arXiv: 2603.13346
- posterior label smoothing for node classification | arXiv: 2406.00410
- pragworld a benchmark evaluating llms local world model under minimal linguistic | arXiv: 2511.13021
- precise reducing the bias of llm evaluations using prediction-powered ranking es | arXiv: 2601.18777
- predict and resist long-term accident anticipation under sensor noise | arXiv: 2511.08640
- predicting the future by retrieving the past | arXiv: 2511.05859
- predicting video slot attention queries from random slot-feature pairs | arXiv: 2508.01345
- preference is more than comparisons rethinking dueling bandits with augmented hu | arXiv: 2511.09047
- prefixgpt prefix adder optimization by a generative pre-trained transformer | arXiv: 2511.19472
- presstrack-hmr pressure-based top-down multi-person global human mesh recovery | arXiv: 2511.09147
- prime planning and retrieval-integrated memory for enhanced reasoning | arXiv: 2509.22315
- principles2plan llm-guided system for operationalising ethical principles into p | arXiv: 2512.08536
- PriorDrive: 用统一向量先验增强在线HD地图构建 | arXiv: 2409.05352
- priorrg prior-guided contrastive pre-training and coarse-to-fine decoding for ch | arXiv: 2508.05353
- prism privacy-aware routing for adaptive cloud-edge llm inference via semantic s | arXiv: 2511.22788
- privacy auditing of multi-domain graph pre-trained model under membership infere | arXiv: 2511.17989
- privacy on the fly a predictive adversarial transformation network for mobile se | arXiv: 2511.07242
- privacy-protected retrieval-augmented generation for knowledge graph question an | arXiv: 2508.08785
- private frequency estimation via residue number systems | arXiv: 2511.11569
- probabilistic hash embeddings for online learning of categorical features | arXiv: 2511.20893
- ProBench: Benchmarking GUI Agents with Accurate Process Information | arXiv: 2511.09157
- probfm probabilistic time series foundation model with uncertainty decomposition | arXiv: 2601.10591
- Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models | arXiv: 2511.12464
- problog4fairness a neurosymbolic approach to modeling and mitigating bias | arXiv: 2511.09768
- procache constraint-aware feature caching with selective computation for diffusi | arXiv: 2512.17298
- profuser progressive fusion of large language models | arXiv: 2408.04998
- promoting sustainable web agents benchmarking and estimating energy consumption | arXiv: 2511.04481
- promptmoe generalizable zero-shot anomaly detection via visually-guided prompt m | arXiv: 2511.18116
- propl universal semi-supervised ultrasound image segmentation via prompt-guided | arXiv: 2511.15057
- prototype-based semantic consistency alignment for domain adaptive retrieval | arXiv: 2512.04524
- ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Aligned Sparse Autoencoders | arXiv: 2509.05309
- provably data-driven projection method for quadratic programming | arXiv: 2509.04524
- provably efficient multi-objective bandit algorithms under preference-centric cu | arXiv: 2502.13457
- provably minimum-length conformal prediction sets for ordinal classification | arXiv: 2511.16845
- Prune4Web: DOM Tree Pruning Programming for Web Agent | arXiv: 2511.21398
- psa-mf personality-sentiment aligned multi-level fusion for multimodal sentiment | arXiv: 2512.01442
- psm prompt sensitivity minimization via llm-guided black-box optimization | arXiv: 2511.16209
- pulsemind a multi-modal medical model for real-world clinical diagnosis | arXiv: 2601.07344
- put the space of lora initialization to the extreme to preserve pre-trained know | arXiv: 2503.02659
- q-fsru quantum-augmented frequency-spectral fusion for medical visual question a | arXiv: 2508.12036
- qa-flora data-free query-adaptive fusion of loras for llms | arXiv: 2512.11366
- qgshap quantum acceleration for faithful gnn explanations | arXiv: 2512.03099
- qimeng-kernel macro-thinking micro-coding paradigm for llm-based high-performanc | arXiv: 2511.20100
- quantifying conversational reliability of large language models under multi-turn | arXiv: 2603.01423
- quantvsr low-bit post-training quantization for real-world video super-resolutio | arXiv: 2508.04485
- quept quantized elastic precision transformers with one-shot calibration for mul | arXiv: 2602.12609
- quiet feature learning in algorithmic tasks | arXiv: 2505.03997
- r-avst empowering video-llms with fine-grained spatio-temporal reasoning in comp | arXiv: 2511.16901
- racketvision a multiple racket sports benchmark for unified ball and racket anal | arXiv: 2511.17045
- radar-aplanc unsupervised radar-based heartbeat sensing via augmented pseudo-lab | arXiv: 2511.08071
- radarllm empowering large language models to understand human motion from millim | arXiv: 2504.09862
- radarmp motion perception for 4d mmwave radar in autonomous driving | arXiv: 2511.12117
- radiation-preserving selective imaging for pediatric hip dysplasia a cross-modal | arXiv: 2511.18457
- ragfort dual-path defense against proprietary knowledge base extraction in retri | arXiv: 2511.10128
- rast a retrieval augmented spatio-temporal framework for traffic prediction | arXiv: 2508.16623
- rcae recursive reconstruction framework for unsupervised industrial anomaly dete | arXiv: 2512.11284
- real-time 3d object detection with inference-aligned learning | arXiv: 2511.16140
- real-time trust verification for safe agentic actions using trustbench | arXiv: 2603.09157
- realign text-to-motion generation via step-aware reward-guided alignment | arXiv: 2511.19217
- realism control one-step diffusion for real-world image super-resolution | arXiv: 2509.10122
- realistic curriculum reinforcement learning for autonomous and sustainable marin | arXiv: 2601.10911
- realistic face reconstruction from facial embeddings via diffusion models | arXiv: 2602.13168
- realistic synthetic household data generation at scale | arXiv: 2602.07243
- reap enhancing rag with recursive evaluation and adaptive planning for multi-hop | arXiv: 2511.09966
- reason reinforced causal search with information bottleneck for video understand | arXiv: 2511.12530
- reasoning about the unsaid misinformation detection with omission-aware graph in | arXiv: 2512.01728
- reasoning or memorization unreliable results of reinforcement learning due to da | arXiv: 2507.10532
- reasoning with exploration an entropy perspective | arXiv: 2506.14758
- recad reinforcement learning enhanced parametric cad model generation with visio | arXiv: 2512.06328
- recast reliability-aware codebook assisted lightweight time series forecasting | arXiv: 2511.11991
- recode updating code api knowledge with reinforcement learning | arXiv: 2506.20495
- recon-ipsundrum an inspectable recurrent persistence loop agent with affect-coup | arXiv: 2602.23232
- Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts | arXiv: 2512.18718
- rectified noise a generative model using positive-incentive noise | arXiv: 2511.07911
- rectom a benchmark for evaluating machine theory of mind in llm-based conversati | arXiv: 2511.22275
- recursive visual imagination and adaptive linguistic grounding for vision langua | arXiv: 2507.21450
- reducing the scope of language models | arXiv: 2410.21597
- redundant queries in detr-based 3d detection methods unnecessary and prunable | arXiv: 2412.02054
- ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting | arXiv: 2603.01417
- reference recommendation based membership inference attack against hybrid-based | arXiv: 2512.09442
- refidiff progressive refinement diffusion for efficient missing data imputation | arXiv: 2505.14451
- refine and align confidence calibration through multi-agent interaction in vqa | arXiv: 2511.11169
- refinevad semantic-guided feature recalibration for weakly supervised video anom | arXiv: 2511.13204
- reflection-driven control for trustworthy code agents | arXiv: 2512.21354
- ReflexDiffusion: 反思增强的高侧向加速度自动驾驶轨迹规划 | arXiv: 2601.09377
- regal a first look at ppo-based legal ai for judgment prediction and summarizati | arXiv: 2512.18014
- regionmarker a region-triggered semantic watermarking framework for embedding-as | arXiv: 2511.13329
- regular games -- an automata-based general game playing language | arXiv: 2511.10593
- reimagining anomalies what if anomalies were normal | arXiv: 2402.14469
- reina regularized entropy information-based loss for efficient simultaneous spee | arXiv: 2508.04946
- reinforced rate control for neural video compression via inter-frame rate-distor | arXiv: 2601.19293
- relactrl relevance-guided efficient control for diffusion transformers | arXiv: 2502.14377
- relation-r1 progressively cognitive chain-of-thought guided reinforcement learni | arXiv: 2504.14642
- Relink: Constructing Query-Driven Evidence Graph On-the-Fly for GraphRAG | arXiv: 2601.07192
- remember me bridging the long-range gap in lvlms with three-step inference-only | arXiv: 2511.09868
- RENEW: Risk- and Energy-Aware Navigation in Dynamic Waterways | arXiv: 2601.16424
- renormalization group guided tensor network structure search | arXiv: 2512.24663
- resource efficient sleep staging via multi-level masking and prompt learning | arXiv: 2511.06785
- rethinking bias in generative data augmentation for medical ai a frequency recal | arXiv: 2511.12301
- rethinking direct preference optimization in diffusion models | arXiv: 2505.18736
- rethinking flow and diffusion bridge models for speech enhancement | arXiv: 2602.18355
- rethinking long-tailed dataset distillation a uni-level framework with unbiased | arXiv: 2511.18858
- rethinking multimodal point cloud completion a completion-by-correction perspect | arXiv: 2511.12170
- rethinking progression of memory state in robotic manipulation an object-centric | arXiv: 2511.11478
- rethinking rainy 3d scene reconstruction via perspective transforming and bright | arXiv: 2511.06734
- rethinking surgical smoke a smoke-type-aware laparoscopic video desmoking method | arXiv: 2512.02780
- rethinking target label conditioning in adversarial attacks a 2d tensor-guided g | arXiv: 2504.14137
- rethinking the spatio-temporal alignment of end-to-end 3d perception | arXiv: 2512.23635
- rethinking visual token reduction in lvlms under cross-modal misalignment | arXiv: 2506.22283
- retrieving objects from 3d scenes with box-guided open-vocabulary instance segme | arXiv: 2512.19088
- retrysql text-to-sql training with retry data for self-correcting query generati | arXiv: 2507.02529
- revealing pomdps qualitative and quantitative analysis for parity objectives | arXiv: 2511.13134
- revisiting the data sampling in multimodal post-training from a difficulty-disti | arXiv: 2511.06722
- revisiting unfairness in recourse by minimizing worst-case social burden | arXiv: 2509.04128
- revitalizing canonical pre-alignment for irregular multivariate time series fore | arXiv: 2508.01971
- reward redistribution via gaussian process likelihood estimation | arXiv: 2503.17409
- rexo indoor multi-view radar object detection via 3d bounding box diffusion | arXiv: 2511.17806
- RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA | arXiv: 2512.15219
- right looks wrong reasons compositional fidelity in text-to-image generation | arXiv: 2511.10136
- risk-sensitive exponential actor critic | arXiv: 2602.07202
- rlslm a hybrid reinforcement learning framework aligning rule-based social locom | arXiv: 2511.11323
- RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models (Oral) | arXiv: 2512.06811
- roadscenevqa benchmarking visual question answering in roadside perception syste | arXiv: 2511.18286
- robust long-term test-time adaptation for 3d human pose estimation through motio | arXiv: 2511.18851
- Robust Out-of-Order Retrieval for Grid-Based Storage at Maximum Capacity | arXiv: 2601.19144
- robust tabular foundation models | arXiv: 2512.03307
- robust watermarking on gradient boosting decision trees | arXiv: 2511.09822
- rpm-mcts knowledge-retrieval as process reward model with monte carlo tree searc | arXiv: 2511.19895
- rrra resampling and reranking through a retriever adapter | arXiv: 2508.11670
- rs2-sam2 customized sam2 for referring remote sensing image segmentation | arXiv: 2503.07266
- rsvg-zeroov exploring a training-free framework for zero-shot open-vocabulary vi | arXiv: 2509.18711
- rtgaze real-time 3d-aware gaze redirection from a single image | arXiv: 2511.11289
- S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning | arXiv: 2511.06727
- s2drug bridging protein sequence and 3d structure in contrastive representation | arXiv: 2511.07006
- s5 scalable semi-supervised semantic segmentation in remote sensing | arXiv: 2508.12409
- safemil learning offline safe imitation policy from non-preferred trajectories | arXiv: 2511.08136
- safenlidb a privacy-preserving safety alignment framework for llm-based natural | arXiv: 2511.06778
- safer-clip mitigating nsfw content in vision-language models while preserving pr | arXiv: 2511.16743
- safesieve from heuristics to experience in progressive pruning for llm-based mul | arXiv: 2508.11733
- saga learning signal-aligned distributions for improved text-to-image generation | arXiv: 2508.13866
- sage spuriousness-aware guided prompt exploration for mitigating multimodal bias | arXiv: 2511.13005
- sam-daq segment anything model with depth-guided adaptive queries for rgb-d vide | arXiv: 2511.09870
- sampling control for imbalanced calibration in semi-supervised learning | arXiv: 2511.18773
- saot an enhanced locality-aware spectral transformer for solving pdes | arXiv: 2511.18777
- SAPO: Self-Adaptive Process Optimization Makes Small Reasoners Stronger | arXiv: 2601.20312
- saq-sam semantically-aligned quantization for segment anything model | arXiv: 2503.06515
- satiredecoder visual cascaded decoupling for enhancing satirical image comprehen | arXiv: 2512.00582
- satisficing and optimal generalised planning via goal regression extended versio | arXiv: 2511.11095
- say more with less variable-frame-rate speech tokenization via adaptive clusteri | arXiv: 2509.04685
- scalable and accurate graph reasoning with llm-based multi-agents | arXiv: 2410.05130
- scalable multi-objective and meta reinforcement learning via gradient estimation | arXiv: 2511.12779
- scalable vision-guided crop yield estimation | arXiv: 2511.12999
- SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling | arXiv: 2512.00466
- scaling and transferability of annealing strategies in large language model trai | arXiv: 2512.13705
- scaling equitable reflection assessment in education via large language models a | arXiv: 2511.11772
- scaling llm speculative decoding non-autoregressive forecasting in large-batch s | arXiv: 2511.20340
- SceneJailEval: A Scenario-Adaptive Multi-Dimensional Framework for Jailbreak Evaluation | arXiv: 2508.06194
- Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction | arXiv: 2602.18403
- SCoPe: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs | arXiv: 2511.07001
- SD-PSFNet: Sequential and Dynamic Point Spread Function Network for Image Deraining | arXiv: 2511.17993
- sdeval safety dynamic evaluation for multimodal large language models | arXiv: 2508.06142
- secmoe communication-efficient secure moe inference via select-then-compute | arXiv: 2601.06790
- see symbolize act grounding vlms with spatial representations for better gamepla | arXiv: 2603.11601
- seeing justice clearly handwritten legal document translation with ocr and visio | arXiv: 2512.18004
- Seeing the Unseen: Zooming in the Dark with Event Cameras | arXiv: 2601.02206
- segment and matte anything in a unified model | arXiv: 2601.12147
- segment anything across shots a method and benchmark | arXiv: 2511.13715
- SELDON: Supernova Explosions Learned by Deep ODE Networks | arXiv: 2603.04392
- self-adaptive graph mixture of models | arXiv: 2511.13062
- self-correction distillation for structured data question answering | arXiv: 2511.07998
- self-npo data-free diffusion model enhancement via truncated diffusion fine-tuni | arXiv: 2505.11777
- self-supervised inductive logic programming | arXiv: 2507.16405
- self-supervised multiplex consensus mamba for general image fusion | arXiv: 2512.20921
- semanticvla semantic-aligned sparsification and enhancement for efficient roboti | arXiv: 2511.10518
- semc structure-enhanced mixture-of-experts contrastive learning for ultrasound s | arXiv: 2511.12559
- semi-supervised high dynamic range image reconstructing via bi-level uncertain a | arXiv: 2511.12939
- semi-supervised synthetic data generation with fine-grained relevance control fo | arXiv: 2509.16717
- sentient detecting apts via capturing indirect dependencies and behavioral logic | arXiv: 2502.06521
- serl self-examining reinforcement learning on open-domain | arXiv: 2511.07922
- Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems | arXiv: 2511.18467
- shapbpt image feature attributions using data-aware binary partition trees | arXiv: 2602.07047
- share your attention transformer weight sharing via matrix-based dictionary lear | arXiv: 2508.04581
- sharp eyes and memory for videollms information-aware visual token pruning for e | arXiv: 2511.08003
- sheaf graph neural networks via pac-bayes spectral optimization | arXiv: 2508.00357
- shortagesim simulating drug shortages under information asymmetry | arXiv: 2509.01813
- shrinking the teacher an adaptive teaching paradigm for asymmetric eeg-vision al | arXiv: 2511.11422
- sign schema-induced games for naming | arXiv: 2510.21855
- sim-to-real an unsupervised noise layer for screen-camera watermarking robustnes | arXiv: 2504.18906
- sim4seg boosting multimodal multi-disease medical diagnosis segmentation with re | arXiv: 2511.06665
- simba towards high-fidelity and geometrically-consistent point cloud completion | arXiv: 2511.16161
- simdiff simpler yet better diffusion model for time series point forecasting | arXiv: 2511.19256
- simrod a simple baseline for raw object detection with global and local enhancem | arXiv: 2503.07101
- Sketch-HARP: 分层自回归草图生成实现灵活笔画级绘制操控 | arXiv: 2511.07889
- skill path unveiling language skills from circuit graphs | arXiv: 2410.01334
- skipcat rank-maximized low-rank compression of large language models via shared | arXiv: 2512.13494
- slidetailor personalized presentation slide generation for scientific papers | arXiv: 2512.20292
- sm3det a unified model for multi-modal remote sensing object detection | arXiv: 2412.20665
- small but mighty dynamic wavelet expert-guided fine-tuning of large-scale models | arXiv: 2601.09108
- small language models for efficient agentic tool calling outperforming large mod | arXiv: 2512.15943
- smart a surrogate model for predicting application runtime in dragonfly systems | arXiv: 2511.11111
- smartsplat feature-smart gaussians for scalable compression of ultra-high-resolu | arXiv: 2512.20377
- smofi step-wise momentum fusion for split federated learning on heterogeneous da | arXiv: 2511.09828
- soft filtering guiding zero-shot composed image retrieval with prescriptive and | arXiv: 2512.20781
- som directions are better than one multi-directional refusal suppression in lang | arXiv: 2511.08379
- SoMe: A Realistic Benchmark for LLM-based Social Media Agents | arXiv: 2512.14720
- sonnet spectral operator neural network for multivariable time series forecastin | arXiv: 2505.15312
- soscontrol enhancing human motion generation through saliency-aware symbolic ori | arXiv: 2601.14258
- spa achieving consensus in llm alignment via self-priority optimization | arXiv: 2511.06222
- spacrd multimodal deep fusion of histology and spatial transcriptomics for cance | arXiv: 2603.06186
- span benchmarking and improving cross-calendar temporal reasoning of large langu | arXiv: 2511.09993
- SPARC: 用单一策略驾驶100辆未见车辆的OOD泛化 | arXiv: 2511.09737
- spare single-pass annotation with reference-guided evaluation for automatic proc | arXiv: 2506.15498
- spark query-aware unstructured sparsity with recoverable kv cache channel prunin | arXiv: 2508.15212
- sparse additive model pruning for order-based causal structure learning | arXiv: 2602.15306
- sparse4dgs 4d gaussian splatting for sparse-frame dynamic scene reconstruction | arXiv: 2511.07122
- sparsecoop cooperative perception with kinematic-grounded queries | arXiv: 2512.06838
- sparserm a lightweight preference modeling with sparse autoencoder | arXiv: 2511.07896
- sparsesurf sparse-view 3d gaussian splatting for surface reconstruction | arXiv: 2511.14633
- spatialactor exploring disentangled spatial representations for robust robotic m | arXiv: 2511.09555
- spatio-temporal context learning with temporal difference convolution for moving | arXiv: 2511.09352
- spatiotemporal difference network for video depth super-resolution | arXiv: 2508.01259
- spatiotemporal-untrammelled mixture of experts for multi-person motion predictio | arXiv: 2512.21707
- speakerlm end-to-end versatile speaker diarization and recognition with multimod | arXiv: 2508.06372
- specdiff accelerating diffusion model inference with self-speculation | arXiv: 2509.13848
- specquant spectral decomposition and adaptive truncation for ultra-low-bit llms | arXiv: 2511.11663
- speculative sampling with reinforcement learning | arXiv: 2601.12212
- spherediff tuning-free 360 static and dynamic panorama generation via spherical | arXiv: 2504.14396
- spikcommander a high-performance spiking transformer with multi-view learning fo | arXiv: 2511.07883
- spike imaging velocimetry dense motion estimation of fluids using spike cameras | arXiv: 2504.18864
- spiking heterogeneous graph attention networks | arXiv: 2601.02401
- spikingformer a key foundation model for spiking neural networks | arXiv: 2304.11954
- splat-sap feed-forward gaussian splatting for human-centered scene with scale-aw | arXiv: 2511.22704
- splats in splats robust and effective 3d steganography towards gaussian splattin | arXiv: 2412.03121
- splatssc decoupled depth-guided gaussian splatting for semantic scene completion | arXiv: 2508.02261
- split-layer enhancing implicit neural representation by maximizing the dimension | arXiv: 2511.10142
- sproutbench a benchmark for safe and ethical large language models for youth | arXiv: 2508.11009
- sr-ki scalable and real-time knowledge integration into llms via supervised atte | arXiv: 2511.06446
- ssr semantic and spatial rectification for clip-based weakly supervised segmenta | arXiv: 2512.01701
- stabilizing self-consuming diffusion models with latent space filtering | arXiv: 2511.12742
- Stable Voting and the Splitting of Cycles | arXiv: 2512.00616
- start small think big curriculum-based relative policy optimization for visual g | arXiv: 2511.13924
- steering one-step diffusion model with fidelity-rich decoder for fast image comp | arXiv: 2508.04979
- steering pretrained drafters during speculative decoding | arXiv: 2511.09844
- stegavar privacy-preserving video action recognition via steganographic domain a | arXiv: 2512.12586
- stelar-vision self-topology-aware efficient learning for aligned reasoning in vi | arXiv: 2508.08688
- stellar scene text editor for low-resource languages and real-world data | arXiv: 2511.09977
- stem efficient relative capability evaluation of llms through structured transit | arXiv: 2508.12096
- stem faculty perspectives on generative ai in higher education | arXiv: 2603.04001
- stepfun-formalizer unlocking the autoformalization potential of llms through kno | arXiv: 2508.04440
- stmi segmentation-guided token modulation with cross-modal hypergraph interactio | arXiv: 2603.00695
- stola self-adaptive touch-language framework with tactile commonsense reasoning | arXiv: 2505.04201
- stratified knowledge-density super-network for scalable vision transformers | arXiv: 2511.11683
- streaming generated gaussian process experts for online learning and control ext | arXiv: 2508.03679
- streaming generation of co-speech gestures via accelerated rolling diffusion | arXiv: 2503.10488
- streamstgs streaming spatial and temporal gaussian grids for real-time free-view | arXiv: 2511.06046
- stride-qa visual question answering dataset for spatiotemporal reasoning in urba | arXiv: 2508.10427
- structural approach to guiding a present-biased agent | arXiv: 2601.07763
- structure-aware encodings of argumentation properties for clique-width | arXiv: 2511.10767
- structure-based rna design by step-wise optimization of latent diffusion model | arXiv: 2601.19232
- structured language generation model loss calibration and formatted decoding for | arXiv: 2402.08971
- structured personalization modeling constraints as matroids for data-minimal llm | arXiv: 2512.11907
- studying classifier-free guidance from a classifier-centric perspective | arXiv: 2503.10638
- stylebreak revealing alignment vulnerabilities in large audio-language models vi | arXiv: 2511.10692
- sugar learning skeleton representation with visual-motion knowledge for action r | arXiv: 2511.10091
- surface-based visibility-guided uncertainty for continuous active 3d neural reco | arXiv: 2405.02568
- svd-no learning pde solution operators with svd integral kernels | arXiv: 2511.10025
- Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments | arXiv: 2509.01022
- symmetrical flow matching unified image generation segmentation and classificati | arXiv: 2506.10634
- synweather weather observation data synthesis across multiple regions and variab | arXiv: 2511.08291
- t-lora single image diffusion model customization without overfitting | arXiv: 2507.05964
- t-rex-omni integrating negative visual prompt in generic object detection | arXiv: 2511.08997
- t2agent a tool-augmented multimodal misinformation detection agent with monte ca | arXiv: 2505.19768
- t2i-riskyprompt a benchmark for safety evaluation attack and defense on text-to- | arXiv: 2510.22300
- tab-pet graph-based positional encodings for tabular transformers | arXiv: 2511.13338
- tabflash efficient table understanding with progressive question conditioning an | arXiv: 2511.13283
- tackling resource-constrained and data-heterogeneity in federated learning with | arXiv: 2601.01840
- tadarag task adaptive retrieval-augmented generation via on-the-fly knowledge gr | arXiv: 2511.12520
- taligndiff automatic tooth alignment assisted by diffusion-based transformation | arXiv: 2508.04565
- talk snap complain validation-aware multimodal expert framework for fine-grained | arXiv: 2511.14693
- talksketch multimodal generative ai for real-time sketch ideation with speech | arXiv: 2511.05817
- TAPA: Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments | arXiv: 2508.11425
- target refocusing via attention redistribution for open-vocabulary semantic segm | arXiv: 2511.16170
- targeted data protection for diffusion model by matching training trajectory | arXiv: 2512.10433
- task aware modulation using representation learning for upsaling of terrestrial | arXiv: 2603.09974
- Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data | arXiv: 2601.07474
- task-aware retrieval augmentation for dynamic recommendation | arXiv: 2511.12495
- task-specific distance correlation matching for few-shot action recognition | arXiv: 2512.11340
- tawpipe topology-aware weight pipeline parallelism for accelerating long-context | arXiv: 2511.09741
- TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models | arXiv: 2507.10643
- tdsnns competitive topographic deep spiking neural networks for visual cortex mo | arXiv: 2508.04270
- teaching large language models to maintain contextual faithfulness via synthetic | arXiv: 2505.16483
- temple incentivizing temporal understanding of video large language models via p | arXiv: 2503.16929
- temporal inconsistency guidance for super-resolution video quality assessment | arXiv: 2412.18933
- temporal object-aware vision transformer for few-shot video object detection | arXiv: 2511.13784
- Test-driven Reinforcement Learning in Continuous Control | arXiv: 2511.07904
- test-time diverse reasoning by riemannian activation steering | arXiv: 2511.08305
- text-guided channel perturbation and pretrained knowledge integration for unifie | arXiv: 2511.12432
- text-guided controllable diffusion for realistic camouflage images generation | arXiv: 2511.20218
- text-routed sparse mixture-of-experts model with explanation and temporal alignm | arXiv: 2512.22741
- text-to-scene with large reasoning models | arXiv: 2509.26091
- textshield-r1 reinforced reasoning for tampered text detection | arXiv: 2602.19828
- tg-field geometry-aware radiative gaussian fields for tomographic reconstruction | arXiv: 2602.11705
- tgdd trajectory guided dataset distillation with balanced distribution | arXiv: 2512.02469
- the confidence trap gender bias and predictive certainty in llms | arXiv: 2601.07806
- the curious case of analogies investigating analogical reasoning in large langua | arXiv: 2511.20344
- the limitations and power of np-oracle-based functional synthesis techniques | arXiv: 2512.20572
- the publication choice problem | arXiv: 2511.13678
- the triangle of similarity a multi-faceted framework for comparing neural networ | arXiv: 2601.17093
- theoretical and empirical analysis of lehmer codes to search permutation spaces | arXiv: 2511.19089
- theory of mind for explainable human-robot interaction | arXiv: 2512.23482
- think how your teammates think active inference can benefit decentralized execut | arXiv: 2511.18761
- think speak decide language-augmented multi-agent reinforcement learning for eco | arXiv: 2511.12876
- thinker training llms in hierarchical thinking for deep search via multi-turn in | arXiv: 2511.07943
- thucy an llm-based multi-agent system for claim verification across relational d | arXiv: 2512.03278
- time identity and consciousness in language model agents | arXiv: 2603.09043
- timebill time-budgeted inference for large language models | arXiv: 2512.21859
- tinychemvl advancing chemical vision-language models via efficient visual token | arXiv: 2511.06283
- tmdc a two-stage modality denoising and complementation framework for multimodal | arXiv: 2511.10325
- to align or not to align strategic multimodal representation alignment for optim | arXiv: 2511.12121
- ToC: Tree-of-Claims Search with Multi-Agent Language Models | arXiv: 2511.16972
- tofa training-free one-shot federated adaptation for vision-language models | arXiv: 2511.16423
- tokenize once recommend anywhere unified item tokenization for multi-domain llm- | arXiv: 2511.12922
- TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents | arXiv: 2504.12679
- tool4poi a tool-augmented llm framework for next poi recommendation | arXiv: 2511.06405
- toporeformer mitigating adversarial attacks using topological purification in oc | arXiv: 2511.15807
- tosc task-oriented shape completion for open-world dexterous grasp generation fr | arXiv: 2601.05499
- touchformer a robust transformer-based framework for multimodal material percept | arXiv: 2511.19509
- Toward Gaze Target Detection in Young Autistic Children | arXiv: 2511.11244
- toward the frontiers of reliable diffusion sampling via adversarial sinkhorn att | arXiv: 2511.07499
- towards 3d object-centric feature learning for semantic scene completion | arXiv: 2511.13031
- towards a common framework for autoformalization | arXiv: 2509.09810
- towards a foundation model for partial differential equations across physics dom | arXiv: 2511.21861
- towards a rigorous understanding of the population dynamics of the nsga-iii tigh | arXiv: 2511.07125
- towards affordance-aware robotic dexterous grasping with human-like priors | arXiv: 2508.08896
- towards authentic movie dubbing with retrieve-augmented director-actor interacti | arXiv: 2511.14249
- towards better code understanding in decoder-only models with contrastive learni | arXiv: 2406.12326
- towards effective and efficient context-aware nucleus detection in histopatholog | arXiv: 2503.05678
- towards effective stealthy and persistent backdoor attacks targeting graph found | arXiv: 2511.17982
- towards human-ai accessibility mapping in india vlm-guided annotations and poi-c | arXiv: 2602.09216
- Towards Inference-Time Scaling for Continuous Space Reasoning | arXiv: 2510.12167
- towards llm-empowered knowledge tracing via llm-student hierarchical behavior al | arXiv: 2602.22879
- towards long-window anchoring in vision-language model distillation | arXiv: 2512.21576
- towards multiple missing values-resistant unsupervised graph anomaly detection | arXiv: 2511.09917
- towards non-stationary time series forecasting with temporal stabilization and f | arXiv: 2511.08229
- Towards Reinforcement Learning from Neural Feedback: Mapping fNIRS Signals to Agent Performance | arXiv: 2511.12844
- towards scalable web accessibility audit with mllms as copilots | arXiv: 2511.03471
- towards temporal fusion beyond the field of view for camera-based semantic scene | arXiv: 2511.12498
- towards test-time efficient visual place recognition via asymmetric query proces | arXiv: 2512.13055
- towards trustworthy multi-turn llm agents via behavioral guidance | arXiv: 2512.11421
- towermind a tower defence game learning environment and benchmark for llm as age | arXiv: 2601.05899
- trace a generalizable drift detector for streaming data-driven optimization | arXiv: 2512.07082
- trace textual relevance augmentation and contextual encoding for multimodal hate | arXiv: 2504.17902
- tracking and segmenting anything in any modality | arXiv: 2511.19475
- tractable weighted first-order model counting with bounded treewidth binary evid | arXiv: 2511.09174
- trade-offs in large reasoning models an empirical analysis of deliberative and a | arXiv: 2503.17979
- training-free policy violation detection via activation-space whitening in llms | arXiv: 2512.03994
- transferable backdoor attacks for code models via sharpness-aware adversarial pe | arXiv: 2602.11213
- transferable hypergraph attack via injecting nodes into pivotal hyperedges | arXiv: 2511.10698
- transmamba a sequence-level hybrid transformer-mamba language model | arXiv: 2503.24067
- transparent networks for multivariate time series | arXiv: 2410.10535
- travellama a multimodal travel assistant with large-scale dataset and structured | arXiv: 2504.16505
- tri-bench stress-testing vlm reliability on spatial reasoning under camera tilt | arXiv: 2512.08860
- trinitydna a bio-inspired foundational model for efficient long-sequence dna mod | arXiv: 2507.19229
- truth justice and secrecy cake cutting under privacy constraints | arXiv: 2511.09882
- truthfulrag resolving factual-level conflicts in retrieval-augmented generation | arXiv: 2511.10375
- tsbow traffic surveillance benchmark for occluded vehicles under various weather | arXiv: 2602.05414
- tsgdiff rethinking synthetic time series generation from a pure graph perspectiv | arXiv: 2511.12174
- tspo temporal sampling policy optimization for long-form video language understa | arXiv: 2508.04369
- TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models | arXiv: 2508.19257
- tubermc tube-conditioned reconstruction with mutual constraints for weakly-super | arXiv: 2511.10241
- uncertainty under the curve a sequence-level entropy area metric for reasoning l | arXiv: 2508.20384
- uncovering bias paths with llm-guided causal discovery an active learning and dy | arXiv: 2506.12227
- uncovering pretraining code in llms a syntax-aware attribution approach | arXiv: 2511.07033
- uncovering zero-shot generalization gaps in time-series foundation models using | arXiv: 2509.26347
- understanding dynamic scenes in ego centric 4d point clouds | arXiv: 2508.07251
- understanding syllogistic reasoning in llms from formal and natural language per | arXiv: 2512.12620
- uniabg unified adversarial view bridging and graph correspondence for unsupervis | arXiv: 2511.12054
- unic-lift unified 3d instance segmentation via contrastive learning | arXiv: 2512.24763
- unifit towards universal virtual try-on with mllm-guided semantic alignment | arXiv: 2511.15831
- unihr hierarchical representation learning for unified knowledge graph link pred | arXiv: 2411.07019
- Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation | arXiv: 2508.14031
- universal safety controllers with learned prophecies | arXiv: 2511.11390
- unleashing semantic and geometric priors for 3d scene completion | arXiv: 2508.13601
- unleashing the potential of large language models for text-to-image generation t | arXiv: 2503.07334
- unlocking efficient vehicle dynamics modeling via analytic world models | arXiv: 2502.10012
- unseen enhancing dataset pruning from a generalization perspective | arXiv: 2511.12988
- unsupervised feature selection through group discovery | arXiv: 2511.09166
- unsupervised motion-compensated decomposition for cardiac mri reconstruction via | arXiv: 2511.11436
- unsupervised multi-parameter inverse solving for reducing ring artifacts in 3d x | arXiv: 2412.05853
- URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding | arXiv: 2511.10552
- urban incident prediction with graph neural networks integrating government rati | arXiv: 2506.08740
- urbannav learning language-guided urban navigation from web-scale human trajecto | arXiv: 2512.09607
- use a unified model for universal sound separation and extraction | arXiv: 2512.21215
- using certifying constraint solvers for generating step-wise explanations | arXiv: 2511.10428
- uvlm benchmarking video language model for underwater world understanding | arXiv: 2507.02373
- variance computation for weighted model counting with knowledge compilation appr | arXiv: 2601.03523
- vascular anatomy-aware self-supervised pre-training for x-ray angiogram analysis | arXiv: 2602.11536
- verb mirage unveiling and assessing verb concept hallucinations in multimodal la | arXiv: 2412.04939
- verification-guided context optimization for tool calling via hierarchical llms- | arXiv: 2512.13860
- vggt-dp generalizable robot control via vision foundation models | arXiv: 2509.18778
- vidia2std a parallel corpus and methods for low-resource vietnamese dialect-to-s | arXiv: 2603.10211
- VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness | arXiv: 2601.12672
- VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use | arXiv: 2410.16400
- vir-bench evaluating geospatial and temporal understanding of mllms via travel v | arXiv: 2509.19002
- virtual multiplex staining for histological images using a marker-wise condition | arXiv: 2508.14681
- vision transformers are circulant attention learners | arXiv: 2512.21542
- vision-language reasoning for geolocalization a reinforcement learning approach | arXiv: 2601.00388
- Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction (Oral) | arXiv: 2508.10936v2
- vista scene-aware optimization for streaming video question answering under post | arXiv: 2602.08448
- vitaldiagnosis ai-driven ecosystem for 247 vital monitoring and chronic disease | arXiv: 2601.15798
- vk-det visual knowledge guided prototype learning for open-vocabulary aerial obj | arXiv: 2511.18075
- vmfcoop towards equilibrium on a unified hyperspherical manifold for prompting b | arXiv: 2511.09540
- voicecloak a multi-dimensional defense framework against unauthorized diffusion- | arXiv: 2505.12332
- voices faces and feelings multi-modal emotion-cognition captioning for mental he | arXiv: 2603.01816
- VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models | arXiv: 2511.11438
- vpho joint visual-physical cue learning and aggregation for hand-object pose est | arXiv: 2511.12030
- vpn visual prompt navigation | arXiv: 2508.01766
- vspo validating semantic pitfalls in ontology via llm-based cq generation | arXiv: 2511.07991
- vtinker guided flow upsampling and texture mapping for high-resolution video fra | arXiv: 2511.16124
- w2s-aligntree weak-to-strong inference-time alignment for large language models | arXiv: 2511.11518
- walking further semantic-aware multimodal gait recognition under long-range cond | arXiv: 2603.14189
- watermod modular token-rank partitioning for probability-balanced llm watermarki | arXiv: 2511.07863
- Wavelet Enhanced Adaptive Frequency Filter for Sequential Recommendation | arXiv: 2511.07028
- wdt-md wavelet diffusion transformers for microaneurysm detection in fundus imag | arXiv: 2511.08987
- well begun half done reinforcement learning with prefix optimization for llm rea | arXiv: 2512.15274
- when eyes and ears disagree can mllms discern audio-visual confusion | arXiv: 2511.10059
- when hallucination costs millions benchmarking ai agents in high-stakes adversar | arXiv: 2510.00332
- when human preferences flip an instance-dependent robust loss for rlhf | arXiv: 2512.00709
- when person re-identification meets event camera a benchmark dataset and an attr | arXiv: 2507.13659
- when refusals fail unstable safety mechanisms in long-context llm agents | arXiv: 2512.02445
- when small models are right for wrong reasons process verification for trustwort | arXiv: 2601.00513
- when top-ranked recommendations fail modeling multi-granular negative feedback f | arXiv: 2511.18700
- when trackers date fish a benchmark and framework for underwater multiple fish t | arXiv: 2507.06400
- where and what matters sensitivity-aware task vectors for many-shot multimodal i | arXiv: 2511.08246
- where norms and references collide evaluating llms on normative reasoning | arXiv: 2602.02975
- where to start alignment diffusion large language model may demand a distinct po | arXiv: 2508.12398
- whispering agents an event-driven covert communication protocol for the internet | arXiv: 2508.02188
- why do open-source llms struggle with data analysis a systematic empirical study | arXiv: 2506.19794
- why isnt relational learning taking over the world | arXiv: 2507.13558
- with great capabilities come great responsibilities introducing the agentic risk | arXiv: 2512.22211
- worldrft latent world model planning with reinforcement fine-tuning for autonomo | arXiv: 2512.19133
- x-mutest a multilingual benchmark for explainable hate speech detection and a no | arXiv: 2601.03194
- x2edit revisiting arbitrary-instruction image editing through self-constructed d | arXiv: 2508.07607
- xlinear a lightweight and accurate mlp-based model for long-term time series for | arXiv: 2601.09237
- yes florence i will do better next time agentic feedback reasoning for humorous | arXiv: 2601.07232
- yolo-iod towards real time incremental object detection | arXiv: 2512.22973
- your ai-generated image detector can secretly achieve sota accuracy if calibrate | arXiv: 2602.01973
- yours or mine overwriting attacks against neural audio watermarking | arXiv: 2509.05835
- zero-reference joint low-light enhancement and deblurring via visual autoregress | arXiv: 2511.18591
- cheating stereo matching in full-scale physical adversarial attack against binoc | arXiv: 2511.14386
- monoclue object-aware clustering enhances monocular 3d object detection | arXiv: 2511.07862
- an information theoretic evaluation metric for strong unlearning | arXiv: 2405.17878
- comptrack information bottleneckguided lowrank dynamic token compres | arXiv: 2511.15580
- priordrive enhancing online hd mapping with unified vector p | arXiv: 2409.05352
- reflexdiffusion reflection-enhanced trajectory planning for | arXiv: 2601.09377
- diffbench meets diffagent end-to-end llm-driven diffusion ac | arXiv: 2601.03178
- equacode a multi-strategy jailbreak approach for large language models via equat | arXiv: 2512.23173
- extracting events like code a multi-agent programming framework for zero-shot ev | arXiv: 2511.13118
- mose hierarchical self-distillation enhances early layer embeddings | arXiv: 2503.03008
- recode updating code api knowledge with reinforcement learning | arXiv: 2506.20495
- span benchmarking and improving cross-calendar temporal reasoning of large langu | arXiv: 2511.09993
- tapas are free training-free adaptation of programmatic agen | arXiv: 2508.11425
- towards better code understanding in decoder-only large language models via hie | arXiv: 2406.12326
- towards better code understanding in decoder-only models with contrastive learni | arXiv: 2406.12326
- emergent persuasion will llms persuade without being prompted | arXiv: 2512.22201
- mctsr-zero self-reflective psychological counseling dialogues generation via pri | arXiv: 2505.23229
- teaching large language models to maintain contextual faithfulness via synthetic | arXiv: 2505.16483
- as eastern powers i will veto an investigation of nation-level bias of large lan | arXiv: 2511.10695
- beyond perplexity let the reader select retrieval summaries via spectrum project | arXiv: 2508.05909
- cog-rag cognitive-inspired dual-hypergraph with theme alignment retrieval-augmen | arXiv: 2511.13201
- comlq benchmarking complex logical queries in information retrieval | arXiv: 2511.12004
- comorag a cognitive-inspired memory-organized rag for stateful long narrative re | arXiv: 2508.10419
- convmix a mixed-criteria data augmentation framework for conversational dense re | arXiv: 2508.04001
- do retrieval augmented language models know when they dont know | arXiv: 2509.01476
- does less hallucination mean less creativity an empirical investigation in llms | arXiv: 2512.11509
- exposing the cracks vulnerabilities of retrieval-augmented llm-based machine tra | arXiv: 2510.00829
- himo-clip modeling semantic hierarchy and monotonicity in vi | arXiv: 2511.06653
- knowledge completes the vision a multimodal entity-aware retrieval-augmented gen | arXiv: 2511.21002
- llms for game theory entropy-guided in-context learning and adaptive cot reasoni | arXiv: 2601.10775
- magnitude matters a superior class of similarity metrics for holistic semantic u | arXiv: 2509.19323
- mavis a benchmark for multimodal source attribution in long-form visual question | arXiv: 2511.12142
- mem-pal towards memory-based personalized dialogue assistants for long-term user | arXiv: 2511.13410
- multimodal deepresearcher generating text-chart interleaved | arXiv: 2506.02454
- n2n-gqa noise-to-narrative for graph-based table-text question answering using l | arXiv: 2601.06603
- neighbor-aware instance refining with noisy labels for cross-modal retrieval | arXiv: 2512.24064
- oad-promoter enhancing zero-shot vqa using large language models with object att | arXiv: 2511.12131
- positional bias in multimodal embedding models do they favor the beginning the m | arXiv: 2511.11216
- precise reducing the bias of llm evaluations using prediction-powered ranking es | arXiv: 2601.18777
- prime planning and retrieval-integrated memory for enhanced reasoning | arXiv: 2509.22315
- reap enhancing rag with recursive evaluation and adaptive planning for multi-hop | arXiv: 2511.09966
- refeed retrieval feedback-guided dataset construction for style-aware query rewr | arXiv: 2603.01417
- rrra resampling and reranking through a retriever adapter | arXiv: 2508.11670
- sr-ki scalable and real-time knowledge integration into llms via supervised atte | arXiv: 2511.06446
- towards inference-time scaling for continuous space reasoning | arXiv: 2510.12167
- when small models are right for wrong reasons process verification for trustwort | arXiv: 2601.00513
- a closer look at knowledge distillation in spiking neural ne | arXiv: 2511.06902
- a coherence-based measure of agi | arXiv: 2510.20784
- adaptive evidential learning for temporal-semantic robustnes | arXiv: 2512.00953v1
- attention gathers mlps compose a causal analysis of an action-outcome circuit in | arXiv: 2603.11142
- beyond hallucinations a composite score for measuring reliability in open-source | arXiv: 2512.24058
- concepts from representations post-hoc concept bottleneck models via sparse deco | arXiv: 2601.12303
- crosscheck-bench diagnosing compositional failures in multim | arXiv: 2511.21717
- data whitening improves sparse autoencoder learning | arXiv: 2511.13981
- distribution-based feature attribution for explaining the predictions of any cla | arXiv: 2511.09332
- drexperts differential refinement of distortion-aware experts for blind image qu | arXiv: 2602.09531
- elementarynet a non-strategic neural network for predicting human behavior in no | arXiv: 2503.05925
- enhancing binary encoded crime linkage analysis using siamese network | arXiv: 2511.07651
- explainable melanoma diagnosis with contrastive learning and llm-based report ge | arXiv: 2512.06105
- finding the translation switch discovering and exploiting the task-initiation fe | arXiv: 2601.11019
- finevau a novel human-aligned benchmark for fine-grained video anomaly understan | arXiv: 2601.17258
- flashkat understanding and addressing performance bottlenecks in the kolmogorov- | arXiv: 2505.13813
- flexible concept bottleneck model | arXiv: 2511.06678
- fourierpet deep fourier-based unrolled network for low-count pet reconstruction | arXiv: 2601.11680
- gatera token-aware modulation for parameter-efficient fine-tuning | arXiv: 2511.17582
- genepheno interpretable gene knockout-induced phenotype abnormality prediction f | arXiv: 2511.09512
- hskbenchmark modeling and benchmarking chinese second language acquisition in la | arXiv: 2511.15574
- hypothesis generation via llm-automated language bias for ilp | arXiv: 2505.21486
- imad intelligent multi-agent debate for efficient and accura | arXiv: 2511.11306
- induce align predict zero-shot stance detection via cognitive inductive reasonin | arXiv: 2506.13470
- llm circuit analyses consistent across training and scale | arXiv: 2407.10827
- probing preference representations a multi-dimensional evaluation and analysis m | arXiv: 2511.12464
- quiet feature learning in algorithmic tasks | arXiv: 2505.03997
- scope intrinsic semantic space control for mitigating copyright infringement in | arXiv: 2511.07001
- shapbpt image feature attributions using data-aware binary partition trees | arXiv: 2602.07047
- som directions are better than one multi-directional refusal suppression in lang | arXiv: 2511.08379
- spark query-aware unstructured sparsity with recoverable kv cache channel prunin | arXiv: 2508.15212
- toc tree-of-claims search with multi-agent language models | arXiv: 2511.16972
- universal safety controllers with learned prophecies | arXiv: 2511.11390
- unsupervised feature selection through group discovery | arXiv: 2511.09166
- using certifying constraint solvers for generating step-wise explanations | arXiv: 2511.10428
- voices faces and feelings multi-modal emotion-cognition captioning for mental he | arXiv: 2603.01816
- catastrophic forgetting in kolmogorov-arnold networks | arXiv: 2511.12828
- hybrid-dmkg a hybrid reasoning framework over dynamic multimodal knowledge graph | arXiv: 2512.00881
- is the information bottleneck robust enough towards label-noise resistant inform | arXiv: 2512.10573
- model editing as a double-edged sword steering agent ethical behavior toward ben | arXiv: 2506.20606
- multiplicative orthogonal sequential editing for language models | arXiv: 2601.07873
- a multi-agent conversational bandit approach to online evaluation and selection | arXiv: 2501.01849
- a multi-agent llm framework for multi-domain low-resource in-context ner via kno | arXiv: 2511.19083
- autoglm autonomous foundation agents for guis | arXiv: 2411.00820
- autotool efficient tool selection for large language model agents | arXiv: 2511.14650
- gram-r2 self-training generative foundation reward models for reward reasoning | arXiv: 2509.02492
- connectivity-guided sparsification of 2-fwl gnns preserving full expressivity wi | arXiv: 2511.12838
- axis-aligned document dewarping | arXiv: 2507.15000
- bcwildfire a long-term multi-factor dataset and deep learning benchmark for bore | arXiv: 2511.17597
- benchmarking llms for political science a united nations perspective | arXiv: 2502.14122
- beyond accuracy a cognitive load framework for mapping the c | arXiv: 2601.20412
- beyond cosine similarity magnitude-aware clip for no-reference image quality ass | arXiv: 2511.09948
- coninstruct evaluating large language models on conflict detection and resolutio | arXiv: 2511.14342
- dcmatch unsupervised multi-shape matching with dual-level consistency | arXiv: 2509.01204
- dicap distribution-calibrated pseudo-labeling for semi-supervised multi-label le | arXiv: 2511.20225
- gdba revisited unleashing the power of guided local search for distributed const | arXiv: 2508.06899
- gene incremental learning for single-cell transcriptomics | arXiv: 2511.13762
- goal geometrically optimal alignment for continual generalized category discover | arXiv: 2602.19872
- granalign granularity-aware alignment framework for zero-shot video moment retri | arXiv: 2601.00584
- graph out-of-distribution detection via test-time calibration with dual dynamic | arXiv: 2511.13541
- hybridla hybrid generation for document layout analysis | arXiv: 2511.19919
- improved runtime guarantees for the spea2 multi-objective optimizer | arXiv: 2511.07150
- llm-as-a-judge for scalable test coverage evaluation accuracy operational reliab | arXiv: 2512.01232
- lost in benchmarks rethinking large language model benchmarking with item respon | arXiv: 2505.15055
- low-rank curvature for zeroth-order optimization in llm fine-tuning | arXiv: 2511.07971
- maps multi-agent personality shaping for collaborative reaso | arXiv: 2503.16905
- mcts-sql light-weight llms can master the text-to-sql through monte carlo tree s | arXiv: 2501.16607
- mindvote when ai meets the wild west of social media opinion | arXiv: 2505.14422
- moetta test-time adaptation under mixed distribution shifts with moe-layernorm | arXiv: 2511.13760
- nestr a neuro-symbolic abductive framework for temporal reasoning in large langu | arXiv: 2512.07218
- optscale probabilistic optimality for inference-time scaling | arXiv: 2506.22376
- perspective from a broader context can room style knowledge help visual floorpla | arXiv: 2508.01216
- refinevad semantic-guided feature recalibration for weakly supervised video anom | arXiv: 2511.13204
- regular games -- an automata-based general game playing language | arXiv: 2511.10593
- sampling control for imbalanced calibration in semi-supervised learning | arXiv: 2511.18773
- scalable vision-guided crop yield estimation | arXiv: 2511.12999
- spikcommander a high-performance spiking transformer with multi-view learning fo | arXiv: 2511.07883
- streaming generated gaussian process experts for online learning and control ext | arXiv: 2508.03679
- structured language generation model loss calibration and formatted decoding for | arXiv: 2402.08971
- where norms and references collide evaluating llms on normative reasoning | arXiv: 2602.02975
- an invariant latent space perspective on language model inve | arXiv: 2511.19569v1
- from classification to ranking enhancing llm reasoning capabilities for mbti per | arXiv: 2601.18582
- persistent instability in llms personality measurements effects of scale reasoni | arXiv: 2508.04826
- promptmoe generalizable zero-shot anomaly detection via visually-guided prompt m | arXiv: 2511.18116
- smart a surrogate model for predicting application runtime in dragonfly systems | arXiv: 2511.11111
- soft filtering guiding zero-shot composed image retrieval with prescriptive and | arXiv: 2512.20781
- stem efficient relative capability evaluation of llms through structured transit | arXiv: 2508.12096
- temple incentivizing temporal understanding of video large language models via p | arXiv: 2503.16929
- elspr evaluator llm training data self-purification on non-transitive preference | arXiv: 2505.17691
- learning time in static classifiers | arXiv: 2511.12321
- no-regret strategy solving in imperfect-information games via pre-trained embedd | arXiv: 2511.12083
- scaling and transferability of annealing strategies in large language model trai | arXiv: 2512.13705
- uncovering pretraining code in llms a syntax-aware attribution approach | arXiv: 2511.07033
- catformer when continual learning meets spiking transformers with dynamic thresh | arXiv: 2603.15184
- designing truthful mechanisms for asymptotic fair division | arXiv: 2512.10892
- hallucination stations on some basic limitations of transformer-based language m | arXiv: 2507.07505
- llm targeted underperformance disproportionately impacts vulnerable users | arXiv: 2406.17737
- panda -- patch and distribution-aware augmentation for long-tailed exemplar-free | arXiv: 2511.09791
- a principle-driven adaptive policy for group cognitive stimu | arXiv: 2603.10034
- dp-geng differentially private dataset distillation guided by dp-generated data | arXiv: 2511.09876
- earth-adapter bridge the geospatial domain gaps with mixture of frequency adapta | arXiv: 2504.06220
- infocom kilobyte-scale communication-efficient collaborative perception with inf | arXiv: 2512.10305
- bridging the multilingual safety divide efficient culturally-aware alignment for | arXiv: 2602.13867
- consensus-aligned neuron efficient fine-tuning large language models for multi-d | arXiv: 2602.05694
- focusing on language revealing and exploiting language attention heads in multil | arXiv: 2511.07498
- gloctm cross-lingual topic modeling via a global context space | arXiv: 2601.11872
- how does alignment enhance llms multilingual capabilities a language neurons per | arXiv: 2505.21505
- mitigating content effects on reasoning in language models through fine-grained | arXiv: 2505.12189
- nadir differential attention flow for non-autoregressive transliteration in indi | arXiv: 2601.12389
- stellar scene text editor for low-resource languages and real-world data | arXiv: 2511.09977
- vidia2std a parallel corpus and methods for low-resource vietnamese dialect-to-s | arXiv: 2603.10211
- x-mutest a multilingual benchmark for explainable hate speech detection and a no | arXiv: 2601.03194
- patientvlm meets docvlm pre-consultation dialogue between vision language models | arXiv: 2601.10945
- c3tg conflict-aware composite and collaborative controlled text generation | arXiv: 2511.09292
- convex clustering redefined robust learning with higher order norms and beyond | arXiv: 2511.14784
- a fast heuristic search approach for energy-optimal profile | arXiv: 2512.01331
- a graph-theoretical perspective on law design for multiagent | arXiv: 2511.06361
- a graph-theoretical perspective on law design for multiagent systems | arXiv: 2511.06361
- a phase transition for opinion dynamics with competing biase | arXiv: 2511.09434
- a topological rewriting of tarskis mereogeometry | arXiv: 2511.12727
- a learning framework for cooperative collision avoidance of uav swarms leveragin | arXiv: 2507.10913
- fedgrpo privately optimizing foundation models with group-relative rewards from | arXiv: 2602.12014
- from pretrain to pain adversarial vulnerability of video foundation models witho | arXiv: 2511.07049
- argumentative debates for transparent bias detection technic | arXiv: 2508.04511
- beyond detection exploring evidence-based multi-agent debate for misinformation | arXiv: 2511.07267
- cross-modal prompting for balanced incomplete multi-modal emotion recognition | arXiv: 2512.11239
- fact2fiction targeted poisoning attack to agentic fact-check | arXiv: 2508.06059
- factguard event-centric and commonsense-guided fake news detection | arXiv: 2511.10281
- factorut controlling untrusted ai by monitoring their plans | arXiv: 2512.14745
- multi-modal dynamic proxy learning for personalized multiple clustering | arXiv: 2511.07274
- reasoning about the unsaid misinformation detection with omission-aware graph in | arXiv: 2512.01728
- scenejaileval a scenario-adaptive multi-dimensional framework for jailbreak eval | arXiv: 2508.06194
- t2agent a tool-augmented multimodal misinformation detection agent with monte ca | arXiv: 2505.19768
- a unified shape-aware foundation model for time series class | arXiv: 2601.06429v1
- 3d4d an interactive editable 4d world model via 3d video generation | arXiv: 2511.08536
- dreamrunner fine-grained compositional story-to-video genera | arXiv: 2411.16657
- filmweaver weaving consistent multi-shot videos with cache-guided autoregressive | arXiv: 2512.11274
- genvidbench a 6-million benchmark for ai-generated video detection | arXiv: 2501.11340
- mask2iv interaction-centric video generation via mask trajectories | arXiv: 2510.03135
- mofu scale-aware modulation and fourier fusion for multi-subject video generatio | arXiv: 2512.22310
- motioncharacter fine-grained motion controllable human video generation | arXiv: 2411.18281
- omnivdiff omni controllable video diffusion for generation and understanding | arXiv: 2504.10825
- phased one-step adversarial equilibrium for video diffusion models | arXiv: 2508.21019
- seeing the unseen zooming in the dark with event cameras | arXiv: 2601.02206
- spherediff tuning-free 360 static and dynamic panorama generation via spherical | arXiv: 2504.14396