ICML2026 论文笔记 TODO¶
总计: 410 篇 | 已完成: 410 | 待更新: 0
- \(f\)-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses | arXiv: 2605.06977
- (Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models | arXiv: 2604.16429
- A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots | arXiv: 2605.08550
- A Formal Comparison Between Chain of Thought and Latent Thought | arXiv: 2509.25239
- A Geometric Relation of the Error Introduced by Sampling a Language Model's Output Distribution to its Internal State | arXiv: 2605.04899
- A Minimal Agent for Automated Theorem Proving | arXiv: 2602.24273
- A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints | arXiv: 2605.04595
- ACTG-ARL:基于强化学习增强控制的差分隐私条件文本生成 | arXiv: 2510.18232
- Adaptive Querying with AI Persona Priors | arXiv: 2605.00696
- Adversarial Flow Models | arXiv: 2511.22475
- Agent-Omit: Adaptive Context Omission for Efficient LLM Agents | arXiv: 2602.04284
- AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction | arXiv: 2602.05353
- AI Cap-and-Trade: Efficiency Incentives for Accessibility and Sustainability | arXiv: 2601.19886
- Alethia: A Foundational Encoder for Voice Deepfakes | arXiv: 2605.00251
- All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs | arXiv: 2605.12671
- Anchor-guided Hypergraph Condensation with Dual-level Discrimination | arXiv: 2605.10001
- ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models | arXiv: 2605.10328
- Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning | arXiv: 2605.14587
- Annotations Mitigate Post-Training Mode Collapse | arXiv: 2605.09995
- Anomaly-Preference Image Generation (APO) | arXiv: 2605.02439
- ANTIC: Adaptive Neural Temporal In-situ Compressor | arXiv: 2604.09543
- Any3D-VLA: Enhancing VLA Robustness via Diverse Point Clouds | arXiv: 2602.00807
- ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin | arXiv: 2605.13517
- AReaL-SEA:自演化合成数据 + 可验证奖励 RL,把多轮工具调用 Agent 后训到 SOTA | arXiv: 2601.22607
- Beyond Distribution Estimation: Simplex Anchored Structural Inference Towards Universal Semi-Supervised Learning | arXiv: 2605.07557
- BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics | arXiv: 2601.21800
- Black-Box Detection of LLM-Generated Text Using Generalized Jensen-Shannon Divergence | arXiv: 2510.07500
- BLOCK-EM: Preventing Emergent Misalignment via Latent Blocking | arXiv: 2602.00767
- BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models | arXiv: 2605.09134
- Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning | arXiv: 2605.02263
- Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners | arXiv: 2605.14709
- Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression | arXiv: 2510.02345
- Budget-Feasible Mechanisms for Submodular Welfare Maximization in Procurement Auctions | arXiv: 2605.00411
- Calibrated Multimodal Representation Learning with Missing Modalities | arXiv: 2511.12034
- CAMEL: Confidence-Gated Reflection for Reward Modeling | arXiv: 2602.20670
- Can Recommender Systems Teach Themselves? A Recursive Self-Improving Framework with Fidelity Control | arXiv: 2602.15659
- Caracal: Causal Architecture via Spectral Mixing | arXiv: 2605.00292
- CARD: Coarse-to-fine Autoregressive Modeling with Radix-based Decomposition for Transferable Free Energy Estimation | arXiv: 2605.02657
- Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features | arXiv: 2601.22816
- Causal Fine-Tuning under Latent Confounded Shift | arXiv: 2410.14375
- Certified Robustness under Heterogeneous Perturbations via Hybrid Randomized Smoothing | arXiv: 2605.12876
- Circuit Fingerprints: How Answer Tokens Encode Their Geometrical Path | arXiv: 2602.09784
- CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal | arXiv: 2603.21901
- CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning | arXiv: 2602.14068
- CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers | arXiv: 2605.07905
- Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner | arXiv: 2510.03206
- CoFrGeNet: Continued Fraction Architectures for Language Generation | arXiv: 2601.21766
- CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models | arXiv: 2605.01231
- Complexity as Advantage: A Regret-Based Perspective on Emergent Structure | arXiv: 2511.04590
- Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision | arXiv: 2509.14234
- Conditional Diffusion Sampling | arXiv: 2605.04013
- Conformal Thinking: Risk Control for Reasoning on a Compute Budget | arXiv: 2602.03814
- Consistent Diffusion Language Models | arXiv: 2605.00161
- Controllable Generative Sandbox for Causal Inference | arXiv: 2603.03587
- CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features | arXiv: 2508.12535
- CPMöbius: Iterative Coach–Player Reasoning for Data-Free Reinforcement Learning | arXiv: 2602.02979
- CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing | arXiv: 2602.15823
- DAG: A Dual Correlation Network for Time Series Forecasting with Exogenous Variables | arXiv: 2509.14933
- Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise | arXiv: 2408.09929
- Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning | arXiv: 2605.12906
- DCER: Robust Multimodal Fusion via Dual-Stage Compression and Energy-Based Reconstruction | arXiv: 2602.04904
- Decision Tree Learning on Product Spaces | arXiv: 2605.12983
- Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation | arXiv: 2605.01448
- Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning | arXiv: 2605.05676
- DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving | arXiv: 2605.10564
- Demystifying When Pruning Works via Representation Hierarchies | arXiv: 2603.24652
- DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection | arXiv: 2511.13108
- Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers | arXiv: 2605.14270
- DiscoverLLM: 从执行用户意图到帮用户发现意图 | arXiv: 2602.03429
- DoLQ: 用 LLM 做定性 + 定量评估发现常微分方程 | arXiv: 2605.07323
- Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation | arXiv: 2602.20816
- Doubly Outlier-Robust Online Infinite Hidden Markov Model | arXiv: 2604.14322
- DP-KFC: Data-Free Preconditioning for Privacy-Preserving Deep Learning | arXiv: 2605.13418
- DR.Q: Debiased Model-based Representations for Sample-efficient Continuous Control | arXiv: 2605.11711
- Drift is a Sampling Error: SNR-Aware Power Distributions for Long-Horizon Robotic Planning | arXiv: 2605.09537
- Dual-branch Robust Unlearnable Examples | arXiv: 2605.01718
- DynaDiff: Generative Adaptation of Dynamics to Environmental Shifts via Weight-space Diffusion | arXiv: 2505.13919
- E-mem: Multi-Agent Based Episodic Context Reconstruction for LLM Agent Memory | arXiv: 2601.21714
- EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding | arXiv: 2605.14742
- EDEN:用熵动态调节分支因子的自适应解码 | arXiv: 2605.09745
- Edit-Based Refinement for Parallel Masked Diffusion Language Models | arXiv: 2605.09603
- EEG-Based Multimodal Learning via Hyperbolic Mixture-of-Curvature Experts | arXiv: 2604.12579
- Efficient Reasoning with Hidden Thinking | arXiv: 2501.19201
- Embodied Interpretability:用因果干预诊断 VLA 策略的伪相关依赖 | arXiv: 2605.00321
- Encoding Mismatch:为什么宽 ViT 的特征蒸馏到窄学生总会失败 | arXiv: 2511.15572
- EngiAgent:用全连接协调器把 LLM 多 Agent 拉到工程可行解 | arXiv: 2605.02289
- EOSTok:端到端联合训练 1D 语义 Tokenizer 与自回归图像生成 | arXiv: 2605.00503
- Escaping Mode Collapse in LLM Generation via Geometric Regulation | arXiv: 2605.00435
- Estimating Correlation Clustering Cost in Node-Arrival Stream | arXiv: 2605.07091
- ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment | arXiv: 2601.21484
- Evidential Reasoning Advances Interpretable Real-World Disease Screening | arXiv: 2605.15171
- EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle | arXiv: 2510.16079
- ExCyTIn-Bench: Evaluating LLM Agents on Cyber Threat Investigation | arXiv: 2507.14201
- Exploring and Exploiting Stability in Latent Flow Matching | arXiv: 2605.08398
- Exploring Data-Free LoRA Transferability for Video Diffusion Models | arXiv: 2605.01929
- Express Your Doubts: Probabilistic World Modeling Should Not Be Based on Token logprobs | arXiv: 2505.02072
- FAB: A First-Order AB-based Gradient Algorithm for Distributed Bilevel Optimization over Time-Varying Directed Graphs | arXiv: 2605.06328
- Factored Classifier-Free Guidance | arXiv: 2506.14399
- Fair Dataset Distillation via Cross-Group Barycenter Alignment | arXiv: 2605.00185
- Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators | arXiv: 2605.03969
- Federated Distillation for Whole Slide Image via Gaussian-Mixture Feature Alignment and Curriculum Integration | arXiv: 2605.00578
- FedHPro: Federated Hyper-Prototype Learning via Gradient Matching | arXiv: 2605.13475
- FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA | arXiv: 2602.23638
- Fern: 椭球时间序列预测——用 Brenier 定理直接参数化 Jacobian 谱结构 | arXiv: 2505.17370
- Find, Fix, Reason: Context Repair for Video Reasoning | arXiv: 2604.16243
- FlattenGPT: Depth Compression for Transformer with Layer Flattening | arXiv: 2602.08858
- Flow Sampling: Learning to Sample from Unnormalized Densities via Denoising Conditional Processes | arXiv: 2605.03984
- Focus and Dilution: The Multi-stage Learning Process of Attention | arXiv: 2605.01199
- FRACTAL: 用分数阶 HiPPO 重写状态空间模型的长序列记忆 | arXiv: 2605.08833
- FreeRet: 让 MLLM 不经训练就能当多模态检索器 | arXiv: 2509.24621
- FRISM: 用 SVD 子空间级别的合并把推理能力精细注入 VLM | arXiv: 2601.21187
- From Backward Spreading to Forward Replay: Revisiting Target Construction in LLM Parameter Editing | arXiv: 2605.00358
- From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity | arXiv: 2605.00939
- From Generalist to Specialist Representation | arXiv: 2605.12733
- From Holo Pockets to Electron Density: GPT-style Drug Design with Density | arXiv: 2605.08767
- From Human-Level AI Tales to AI Leveling Human Scales | arXiv: 2602.18911
- FS-I2P: 受人类观察启发、深度可自适应的分层 Focus-Sweep 图像-点云配准网络 | arXiv: 2605.07607
- Full-Spectrum GNN: 把谱滤波从节点抬到节点对,用双变量谱核突破 1-WL | arXiv: 2605.05759
- Game of Thought: 用博弈论求 Nash 均衡让 LLM 在 20 题游戏里抗最坏情况 | arXiv: 2602.01708
- GEM-FI: 用能量门控 + Fisher 调制的证据混合,让单次前向也能多模态地表达 epistemic 不确定性 | arXiv: 2605.03750
- GenExam: A Multidisciplinary Text-to-Image Exam | arXiv: 2509.14232
- GRACE: 把 QAT 与蒸馏统一到 Information Bottleneck 框架下,让 INT4 VLM 反超 BF16 | arXiv: 2601.22709
- Grokking: From Abstraction to Intelligence | arXiv: 2603.29262
- Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration | arXiv: 2605.00370
- GRPO is Secretly a Process Reward Model | arXiv: 2509.21154
- Hallucinations Undermine Trust; Metacognition is a Way Forward | arXiv: 2605.01428
- Harnessing Reasoning Trajectories for Hallucination Detection via Answer-agreement Representation Shaping | arXiv: 2601.17467
- HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Tasks | arXiv: 2605.04525
- HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench | arXiv: 2601.20255
- HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning | arXiv: 2605.05776
- HELIX: Hybrid Encoding with Learnable Identity and Cross-dimensional Synthesis for Time Series Imputation | arXiv: 2605.02278
- Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal | arXiv: 2605.09502
- Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation | arXiv: 2605.00529
- Hierarchical Image Tokenization for Multi-Scale Image Super Resolution | arXiv: 2605.14891
- Holmes: 用层次化证据学习重新审视部分相关视频检索中的不确定性 | arXiv: 2605.06083
- How 'Neural' is a Neural Foundation Model? | arXiv: 2601.21508
- How Reasoning Evolves from Post-Training Data: An Empirical Study Using Chess | arXiv: 2604.05134
- Image Restoration via Diffusion Models with Dynamic Resolution | arXiv: 2605.14267
- Implicit Preference Alignment for Human Image Animation | arXiv: 2605.07545
- Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models | arXiv: 2605.13338
- InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition | arXiv: 2605.02364
- Information-Geometric Adaptive Sampling for Graph Diffusion | arXiv: 2605.00250
- Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression | arXiv: 2605.01402
- Instruction Lens Score: Your Instruction Contributes a Powerful Object Hallucination Detector for Multimodal Large Language Models | arXiv: 2605.12258
- Internalizing Agency from Reflective Experience | arXiv: 2603.16843
- Internalizing Safety Understanding in Large Reasoning Models via Verification | arXiv: 2605.08930
- Interpretability Can Be Actionable | arXiv: 2605.11161
- Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction | arXiv: 2508.19035
- Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models | arXiv: 2605.06510
- iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework | arXiv: 2605.03941
- Jailbreaking Vision-Language Models Through the Visual Modality | arXiv: 2605.00583
- KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Controls | arXiv: 2510.19316
- Krause Synchronization Transformers | arXiv: 2602.11534
- LabBuilder: Protocol-Grounded 3D Layout Generation for Interactable and Safe Laboratory | arXiv: 2605.02288
- LAPRAS: Learning-Augmented PRivate Answering for Linear Query Streams | arXiv: 2605.01960
- Large Vision-Language Models Get Lost in Attention | arXiv: 2605.05668
- Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models | arXiv: 2602.01166
- LatentTSF:把时序预测从"观察空间回归"搬到"潜状态空间预测" | arXiv: 2602.00297
- Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language Models | arXiv: 2602.12498
- Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training | arXiv: 2605.11931
- Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective | arXiv: 2605.03373
- Learning Graph Foundation Models on Riemannian Graph-of-Graphs | arXiv: 2605.09993
- Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach | arXiv: 2605.09964
- Learning to Approximate Uniform Facility Location via Graph Neural Networks | arXiv: 2602.13155
- Learning-Augmented Scalable Linear Assignment Problem Optimization via Neural Dual Warm-Starts | arXiv: 2605.09382
- Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models | arXiv: 2510.08592
- Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization's Impact on VLMs Beyond Accuracy | arXiv: 2509.21173
- Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks | arXiv: 2605.01293
- LightAVSeg: Lightweight Audio-Visual Segmentation | arXiv: 2605.08805
- Lightning Unified Video Editing via In-Context Sparse Attention | arXiv: 2605.04569
- Limits of Convergence-Rate Control for Open-Weight Safety | arXiv: 2602.18868
- LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations | arXiv: 2605.00434
- Linearizing Vision Transformer with Test-Time Training | arXiv: 2605.02772
- Local and Mixing-Based Algorithms for Gaussian Graphical Model Selection from Glauber Dynamics | arXiv: 2412.18594
- Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation | arXiv: 2605.01221
- LoDA: 基于任务驱动子空间分解的 LoRA 持续学习 | arXiv: 2603.00191
- Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism | arXiv: 2512.04341
- Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution | arXiv: 2605.02167
- Many-Shot CoT-ICL: Making In-Context Learning Truly Learn | arXiv: 2605.13511
- Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning | arXiv: 2605.09771
- MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems | arXiv: 2605.06623
- Matroid Algorithms Under Size-Sensitive Independence Oracles | arXiv: 2605.00201
- MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks | arXiv: 2507.23511
- MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio | arXiv: 2605.00969
- MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training | arXiv: 2602.02494
- Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping | arXiv: 2605.04308
- Mesh Field Theory: Port–Hamiltonian Formulation of Mesh-Based Physics | arXiv: 2605.00394
- Meta-learning Structure-Preserving Dynamics | arXiv: 2508.11205
- MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification | arXiv: 2605.14289
- Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization | arXiv: 2605.10067
- Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering | arXiv: 2602.11183
- Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment | arXiv: 2605.04363
- MiVE: Multiscale Vision-language features for reference-guided video Editing | arXiv: 2605.14664
- Mixture Prototype Flow Matching for Open-Set Supervised Anomaly Detection | arXiv: 2605.02438
- ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World | arXiv: 2605.15081
- MoCA:用「蒙眼推理代理」奖励感知,解开 VLM 的"看错还是想错" | arXiv: 2605.14054
- Model Merging Scaling Laws in Large Language Models | arXiv: 2509.24244
- Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models | arXiv: 2602.04509
- MoLA:把"想象的未来视频"翻译成可执行动作的潜动作混合 | arXiv: 2605.12167
- MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier | arXiv: 2603.03756
- MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models | arXiv: 2604.12928
- MotionGRPO: Overcoming Low Intra-Group Diversity in GRPO-Based Egocentric Motion Recovery | arXiv: 2605.05680
- Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication | arXiv: 2604.08944
- MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety | arXiv: 2605.01687
- Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives | arXiv: 2511.18507
- Multimodal Fact-Level Attribution for Verifiable Reasoning | arXiv: 2602.11509
- MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality | arXiv: 2605.05646
- NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating | arXiv: 2605.13651
- Networked Information Aggregation for Binary Classification | arXiv: 2605.01082
- Neural QAOA\(^2\): Differentiable Joint Graph Partitioning and Parameter Initialization for Quantum Combinatorial Optimization | arXiv: 2605.13072
- New Bounds for Kernel Sums via Fast Spherical Embeddings | arXiv: 2605.01263
- NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search | arXiv: 2605.00751
- Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving | arXiv: 2603.13358
- Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs | arXiv: 2605.09433
- OMAC: A Holistic Optimization Framework for LLM-Based Multi-Agent Collaboration | arXiv: 2505.11765
- OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models | arXiv: 2602.04804
- On the Convergence Rate of LoRA Gradient Descent | arXiv: 2512.18248
- On the Expressive Power of GNNs to Solve Linear SDPs | arXiv: 2604.27786
- On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length | arXiv: 2605.02572
- Optimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridges | arXiv: 2605.10917
- Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions | arXiv: 2511.01292
- Optimizing Language Models for Crosslingual Knowledge Consistency | arXiv: 2603.04678
- OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization | arXiv: 2605.04738
- OServe: Accelerating LLM Serving via Spatial-Temporal Workload Orchestration | arXiv: 2602.12151
- OT-Bridge Editor: Geometrically Constrained Stenosis Editing in Coronary Angiography via Entropic Optimal Transport | arXiv: 2605.08851
- OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents | arXiv: 2605.08876
- Pair2Scene: Learning Local Object Relations for Procedural Scene Generation | arXiv: 2604.11808
- Pareto-Guided Optimal Transport for Multi-Reward Alignment | arXiv: 2605.13155
- paSAT: 基于极性感知的子句–文字超图表示学习用于不可满足核预测 | arXiv: 2605.04819
- Path-Coupled Bellman Flows for Distributional Reinforcement Learning | arXiv: 2605.08253
- PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering | arXiv: 2602.23161
- Perceptual Flow Network for Visually Grounded Reasoning | arXiv: 2605.02730
- Phy-CoSF: Physics-Guided Continuous Spectral Fields Reconstruction and Super-Resolution for Snapshot Compressive Imaging | arXiv: 2605.13583
- PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World | arXiv: 2605.05163
- PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions | arXiv: 2605.09538
- PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding | arXiv: 2605.13319
- Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models | arXiv: 2506.19037
- Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation | arXiv: 2605.10118
- Plug-and-Play Label Map Diffusion for Universal Goal-Oriented Navigation | arXiv: 2605.05960
- PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution | arXiv: 2605.03399
- Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning | arXiv: 2605.00265
- Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular Sheaves | arXiv: 2512.00242
- Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration | arXiv: 2605.10203
- Position: Agentic AI Orchestration Should Be Bayes-Consistent | arXiv: 2605.00742
- Position: Assistive Agents Need Accessibility Alignment | arXiv: 2605.13579
- Position: Embodied AI Requires a Privacy-Utility Trade-off | arXiv: 2605.05017
- Position: Reliable AI Needs to Externalize Implicit Knowledge: A Human-AI Collaboration Perspective | arXiv: 2605.02010
- Possibilistic Predictive Uncertainty for Deep Learning | arXiv: 2605.00600
- PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts | arXiv: 2605.05974
- Predicting Large Model Test Losses with a Noisy Quadratic System | arXiv: 2605.09154
- Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs | arXiv: 2602.02001
- Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models | arXiv: 2602.01842
- Privacy Amplification in Differentially Private Zeroth-Order Optimization with Hidden States | arXiv: 2506.00158
- Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection | arXiv: 2605.08651
- Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks | arXiv: 2509.06701
- Probing Cross-modal Information Hubs in Audio-Visual LLMs | arXiv: 2605.10815
- Probing Neural TSP Representations for Prescriptive Decision Support | arXiv: 2602.07216
- Probing RLVR Training Instability through the Lens of Objective-Level Hacking | arXiv: 2602.01103
- Protein Circuit Tracing via Cross-layer Transcoders | arXiv: 2602.12026
- Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch | arXiv: 2605.03346
- Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training | arXiv: 2511.07372
- Provably Data-driven Multiple Hyper-parameter Tuning with Structured Loss Function | arXiv: 2602.02406
- Provably Learning Attention with Queries | arXiv: 2601.16873
- Proxy Compression for Language Modeling | arXiv: 2602.04289
- PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection | arXiv: 2509.26272
- QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL | arXiv: 2605.01862
- Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization | arXiv: 2602.02958
- Quantile-Free Uncertainty Quantification in Graph Neural Networks | arXiv: 2605.04847
- R\(^3\)L: Reasoning 3D Layouts from Relative Spatial Relations | arXiv: 2605.06758
- R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning | arXiv: 2605.14026
- RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation | arXiv: 2605.09907
- REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations | arXiv: 2605.12813
- Realizable Bayes-Consistency for General Metric Losses | arXiv: 2605.03823
- Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge | arXiv: 2605.10805
- Recovering Hidden Reward in Diffusion-Based Policies | arXiv: 2605.00623
- Referring Multiple Regions with Large Multimodal Models via Contextual Latent Steering | arXiv: 2605.01827
- RELO: Reinforcement Learning to Localize for Visual Object Tracking | arXiv: 2605.07379
- ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards | arXiv: 2510.00568
- ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning | arXiv: 2605.00380
- Resting Neurons, Active Insights: Robustify Activation Sparsity for Large Language Models | arXiv: 2512.12744
- Rethink the Role of Neural Decoders in Quantum Error Correction | arXiv: 2605.12046
- Rethinking LLM Ensembling from the Perspective of Mixture Models | arXiv: 2605.00419
- Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models | arXiv: 2602.11824
- Revisiting Photometric Ambiguity for Accurate Gaussian-Splatting Surface Reconstruction | arXiv: 2605.12494
- ReVSI:重建 VLM 三维空间智能评估的高质量基准 | arXiv: 2604.24300
- RHB:度量带工具的 LLM Agent 中奖励黑客行为的基准 | arXiv: 2605.02964
- Riemannian Generative Decoder:丢掉编码器,在任意黎曼流形上做表示学习 | arXiv: 2506.19133
- RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs | arXiv: 2411.19517
- RM-NLHF:用自然语言人类反馈作为生成式奖励模型的过程奖励 | arXiv: 2601.07349
- RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization | arXiv: 2603.20527
- RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression | arXiv: 2605.14359
- S(H)NAP:通过生成式干预归因审计 Sybil 肺癌风险预测模型 | arXiv: 2602.02560
- SafeHarbor: Defining Precise Decision Boundaries via Hierarchical Memory-Augmented Guardrail for LLM Agent Safety | arXiv: 2605.05704
- Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks | arXiv: 2605.05995
- Saving Foundation Flow-Matching Priors for Inverse Problems | arXiv: 2511.16520
- Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts | arXiv: 2602.03473
- Scaling Unsupervised Multi-Source Federated Domain Adaptation through Group-Wise Discrepancy Minimization | arXiv: 2510.08150
- Scaling Vision Transformers for Functional MRI with Flat Maps | arXiv: 2510.13768
- ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning | arXiv: 2510.23818
- Scout: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States | arXiv: 2605.04496
- ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision | arXiv: 2602.14276
- Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation | arXiv: 2605.02757
- Seeing to Generalize: How Visual Data Corrects Binding Shortcuts | arXiv: 2602.15183
- Segment Anything with Robust Uncertainty-Accuracy Correlation | arXiv: 2605.10603
- Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models | arXiv: 2605.08145
- Self-Debias: Self-correcting for Debiasing Large Language Models | arXiv: 2604.08243
- Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression | arXiv: 2502.01941
- SemGrad: Gradients w.r.t. Semantics-Preserving Embeddings Tell LLM Uncertainty | arXiv: 2605.04638
- Semi-Supervised Neural Super-Resolution for Mesh-Based Simulations | arXiv: 2605.09284
- SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation | arXiv: 2605.12389
- SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning | arXiv: 2603.25062
- Singular Bayesian Neural Networks | arXiv: 2602.00387
- Skipping the Zeros in Diffusion Models for Sparse Data Generation | arXiv: 2605.01817
- SLAY: Geometry-Aware Spherical Linearized Attention with Yat-Kernel | arXiv: 2602.04915
- SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs | arXiv: 2604.13710
- Smoothing Slot Attention Iterations and Recurrences | arXiv: 2508.05417
- Smoothness Errors in Dynamics Models and How to Avoid Them | arXiv: 2602.05352
- Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models | arXiv: 2501.13428
- SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning | arXiv: 2602.07458
- Speculative Coupled Decoding for Training-Free Lossless Acceleration of Autoregressive Visual Generation | arXiv: 2510.24211
- SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning | arXiv: 2605.04712
- SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion | arXiv: 2605.01466
- SQSD:用参数动态追踪给微调样本打"安全降级风险分" | arXiv: 2605.04572
- Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance | arXiv: 2605.00553
- STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack | arXiv: 2605.00699
- Statistical Consistency and Generalization of Contrastive Representation Learning | arXiv: 2605.02116
- Steer Like the LLM: Activation Steering that Mimics Prompting | arXiv: 2605.03907
- STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction | arXiv: 2602.08245
- Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning | arXiv: 2605.11975
- Stochastic Sparse Attention for Memory-Bound Inference | arXiv: 2605.01910
- Stop Automating Peer Review Without Rigorous Evaluation | arXiv: 2605.03202
- STORM: Segment, Track, and Object Re-Localization from a Single Image | arXiv: 2511.09771
- Streaming Sliced Optimal Transport | arXiv: 2505.06835
- Structure-Centric Graph Foundation Model via Geometric Bases | arXiv: 2605.08689
- Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges | arXiv: 2605.02973
- Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization | arXiv: 2605.11246
- SURGE: Surrogate Gradient Adaptation in Binary Neural Networks | arXiv: 2605.10989
- SVOO: 训练免费的视频 DiT 稀疏注意力——离线层级稀疏度画像 + 在线双向 QK 协同聚类 | arXiv: 2603.18636
- SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment | arXiv: 2605.08724
- Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs | arXiv: 2505.11556
- T\(^2\)PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning | arXiv: 2605.02178
- TD3B: 方向驱动的离散扩散用于变构结合剂设计 | arXiv: 2605.09810
- Text-Conditional JEPA for Learning Semantically Rich Visual Representations | arXiv: 2605.03245
- The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions | arXiv: 2605.01756
- The Coupling Within: Flow Matching via Distilled Normalizing Flows | arXiv: 2603.09014
- The Cylindrical Representation Hypothesis for Language Model Steering | arXiv: 2605.01844
- The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-modal Divergence | arXiv: 2601.19597
- The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design | arXiv: 2605.01345
- The Realignment Problem: When Right becomes Wrong in LLMs | arXiv: 2511.02623
- The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity | arXiv: 2605.06611
- The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents | arXiv: 2603.00801
- Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving | arXiv: 2601.21351
- Threshold-Guided Optimization for Visual Generative Models | arXiv: 2605.04653
- Time-series Forecasting Through the Lens of Dynamics | arXiv: 2507.15774
- Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection | arXiv: 2602.03216
- Token-Efficient Change Detection in LLM APIs | arXiv: 2602.11083
- ToolMATH: A Math Tool Benchmark for Realistic Long-Horizon Multi-Tool Reasoning | arXiv: 2602.21265
- Top-W: Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for LLMs | arXiv: 2602.10346
- Topology-Preserving Neural Operator Learning via Hodge Decomposition | arXiv: 2605.13834
- Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance | arXiv: 2605.11712
- Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts | arXiv: 2605.03348
- Towards A Generative Protein Evolution Machine with DPLM-Evo | arXiv: 2605.00182
- Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning | arXiv: 2605.01663
- Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines | arXiv: 2605.13981
- Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions | arXiv: 2605.05983
- Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm | arXiv: 2605.10640
- Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models | arXiv: 2605.08128
- Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection | arXiv: 2605.02958
- Training-Inference Consistent Segmented Execution for Long-Context LLMs | arXiv: 2605.11744
- Trajectory-Level Data Augmentation for Offline Reinforcement Learning | arXiv: 2605.13401
- Transformed Latent Variable Multi-Output Gaussian Processes | arXiv: 2605.05133
- TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models | arXiv: 2601.18744
- TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization | arXiv: 2605.00224
- Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Multi-Stream Environments | arXiv: 2510.04142
- Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization | arXiv: 2605.14373
- UGround: Towards Unified Visual Grounding with Unrolled Transformers | arXiv: 2510.03853
- Unbiased and Second-Order-Free Training for High-Dimensional PDEs | arXiv: 2605.14643
- Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference | arXiv: 2603.29002
- Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics | arXiv: 2402.15415
- Understanding LoRA as Knowledge Memory: An Empirical Analysis | arXiv: 2603.01097
- Understanding Self-Supervised Learning via Latent Distribution Matching | arXiv: 2605.03517
- Unified Multimodal Visual Tracking with Dual Mixture-of-Experts | arXiv: 2605.03716
- Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards | arXiv: 2510.00072
- VAnim: 面向矢量动画的渲染感知稀疏状态建模 | arXiv: 2605.01517
- Video2GUI: 从互联网视频合成大规模交互轨迹用于通用 GUI Agent 预训练 | arXiv: 2605.14747
- VideoSEAL: 通过解耦回答权缓解 Agentic 长视频理解中的证据失配 | arXiv: 2605.12571
- Vision-aligned Latent Reasoning for Multi-modal Large Language Model | arXiv: 2602.04476
- Visual Implicit Autoregressive Modeling | arXiv: 2605.01220
- VPD-100K: Towards Generalizable and Fine-grained Visual Privacy Protection | arXiv: 2605.10229
- Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning | arXiv: 2509.15103
- Watch Your Step: Information Injection in Diffusion Models via Shadow Timestep Embedding | arXiv: 2605.00935
- Watermarking LLM Agent Trajectories (ACTHOOK) | arXiv: 2602.18700
- WeatherSyn: An Instruction Tuning MLLM For Weather Forecasting Report Generation | arXiv: 2605.07522
- What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity (GLANCE) | arXiv: 2605.03782
- When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets (CAIA) | arXiv: 2510.00332
- Why Linear Interpretability Works: Invariant Subspaces as a Result of Architectural Constraints | arXiv: 2602.09783
- WZ-LLM:用 Wilf–Zeilberger 符号引导 + LLM 自动证明组合恒等式 | arXiv: 2605.04472
- ZipRerank: 面向长文档的高效列表式多模态重排 | arXiv: 2605.11864
- 具有随机到达的自适应多轮资源分配 | arXiv: 2605.12111
- 分散损失对抗嵌入坍缩:让小语言模型摆脱方向单一化 | arXiv: 2602.00217
- 基于 Neyman 正交化的半参数异质聚类多任务学习的自适应估计与推断 | arXiv: 2605.01907
- 基于策略引导扩散填补的主动表格增强 | arXiv: 2605.10315
- 把分子动力学知识蒸到非自回归离子输运预测器里 | arXiv: 2605.09311
- 揭穿 TTT-KV Binding:它本质就是线性注意力 | arXiv: 2602.21204
- 极值多类有监督对比学习的精细化泛化分析 | arXiv: 2605.07596
- 测试时训练让 In-Context Learning 学得动非线性函数 | arXiv: 2509.25741
- 激活语言化方法真的在传递 "特权信息" 吗? | arXiv: 2509.13316
- 用频率匹配解释脉冲神经网络在毫米波感知上的优势 | arXiv: 2605.09983
- 自回归模型反事实信用归因(CCA)的两道天然壁垒 | arXiv: 2605.01425
- 解耦 Transformer 表征里的方向与幅值:L2 匹配扰动下的双重分离 | arXiv: 2602.11169