ICML2025 论文笔记 TODO¶
总计: 1225 篇 | 已完成: 1225 | 待更新: 0
- a bayesian model selection criterion for selecting pretraining checkpoints | arXiv: 2410.05612
- a certified unlearning approach without access to source data | arXiv: 2506.06486
- a cognac shot to forget bad memories corrective unlearning for graph neural netw | arXiv: 2412.00789
- a cross modal knowledge distillation data augmentation recipe for improving tran | arXiv: 2505.21317
- a general graph spectral wavelet convolution via chebyshev order decomposition | arXiv: 2405.13806
- a generalizable physics-enhanced state space model for long-term dynamics foreca | arXiv: 2507.10792
- a generalization result for convergence in learning-to-optimize | arXiv: 2410.07704
- a mathematical framework for ai-human integration in work | arXiv: 2505.23432
- a near-optimal single-loop stochastic algorithm for convex finite-sum coupled co | arXiv: 2312.02277
- a reasoning-based approach to cryptic crossword clue solving | arXiv: 2506.04824
- a recipe for causal graph regression confounding effects revisited | arXiv: 2507.00440
- a square peg in a square hole meta-expert for long-tailed semi-supervised learni | arXiv: 2505.16341
- a theoretical study of hyper self-attention through the lens of interactions rep | arXiv: 2506.06179
- a unified view on learning unnormalized distributions via noise-contrastive esti | arXiv: 2409.18209
- aaar-10 assessing ais potential to assist research | arXiv: 2410.22394
- ab initio nonparametric variable selection for scalable symbolic regression with | arXiv: 2410.13681
- abkd pursuing a proper allocation of the probability mass in knowledge distillat | arXiv: 2505.04560
- accelerating spectral clustering under fairness constraints | arXiv: 2506.08143
- access controls will solve the dual-use dilemma | arXiv: 2505.09341
- action-constrained imitation learning | arXiv: 2508.14379
- action-dependent optimality-preserving reward shaping | arXiv: 2505.12611
- action-minimization meets generative modeling efficient transition path sampling | arXiv: 2504.18506
- actionpiece contextually tokenizing action sequences for generative recommendati | arXiv: 2502.13581
- activation space interventions can be transferred between large language models | arXiv: 2503.04429
- actor-critics can achieve optimal sample efficiency | arXiv: 2505.03710
- ad-hoc human-ai coordination challenge | arXiv: 2506.21490
- adadecode accelerating llm decoding with adaptive layer parallelism | arXiv: 2506.03700
- adapter naturally serves as decoupler for cross-domain few-shot semantic segment | arXiv: 2506.07376
- adaptive elicitation of latent information using natural language | arXiv: 2504.04204
- adaptive estimation and learning under temporal distribution shift | arXiv: 2505.15803
- adaptive multi-prompt contrastive network for few-shot out-of-distribution detec | arXiv: 2506.17633
- adaptivestep automatically dividing reasoning step through model confidence | arXiv: 2502.13943
- adaworld learning adaptable world models with latent actions | arXiv: 2503.18938
- addressing imbalanced domain-incremental learning through dual-balance collabora | arXiv: 2507.07100
- adhmr aligning diffusion-based human mesh recovery via direct preference optimiz | arXiv: 2505.10250
- adios antibody development via opponent shaping | arXiv: 2409.10588
- adjustment for confounding using pre-trained representations | arXiv: 2506.14329
- advagent controllable blackbox red-teaming on web agents | arXiv: 2410.17401
- adversarial combinatorial semi-bandits with graph feedback | arXiv: 2502.18826
- adversarial cooperative rationalization the risk of spurious correlations in eve | arXiv: 2505.02118
- adversarial inception backdoor attacks against reinforcement learning | arXiv: 2410.13995
- adversarial manipulation of reasoning models using internal representations | arXiv: 2507.03167
- advprompter fast adaptive adversarial prompting for llms | arXiv: 2404.16873
- agacci affiliated grading agents for criteria-centric interface in educational c | arXiv: 2507.05321
- agent warpp workflow adherence via runtime parallel personalization | arXiv: 2507.19543
- aguvis unified pure vision agents for autonomous gui interaction | arXiv: 2412.04454
- alberta wells dataset pinpointing oil and gas wells from satellite imagery | arXiv: 2410.09032
- algebra unveils deep learning -- an invitation to neuroalgebraic geometry | arXiv: 2501.18915
- align-then-unlearn embedding alignment for llm unlearning | arXiv: 2506.13181
- aligning llms by predicting preferences from user writing samples | arXiv: 2505.23815
- aligning protein conformation ensemble generation with physical feedback | arXiv: 2505.24203
- aligning spoken dialogue models from user interactions | arXiv: 2506.21463
- all-atom diffusion transformers unified generative modelling of molecules and ma | arXiv: 2503.03965
- alpha-sql zero-shot text-to-sql using monte carlo tree search | arXiv: 2502.17248
- alphapo reward shape matters for llm alignment | arXiv: 2501.03884
- ampo active multi-preference optimization for self-play preference selection | arXiv: 2502.18293
- an attack to break permutation-based private third-party inference schemes for l | arXiv: 2505.18332
- an efficient matrix multiplication algorithm for accelerating inference in binar | arXiv: 2411.06360
- an efficient private gpt never autoregressively decodes | arXiv: 2505.15252
- angle domain guidance latent diffusion requires rotation rather than extrapolati | arXiv: 2506.11039
- annealing flow generative models towards sampling high-dimensional and multi-mod | arXiv: 2409.20547
- any4 learned 4-bit numeric representation for llms | arXiv: 2507.04610
- are llm belief updates consistent with bayes theorem | arXiv: 2507.17951
- are llms prescient a continuous evaluation using daily news as the oracle | arXiv: 2411.08324
- assistancezero scalably solving assistance games | arXiv: 2504.07091
- asymrnr video diffusion transformers acceleration with asymmetric reduction and | arXiv: 2412.11706
- autoal automated active learning with differentiable query strategy search | arXiv: 2410.13853
- autoencoder-based hybrid replay for class-incremental learning | arXiv: 2505.05926
- autoformulation of mathematical optimization models using llms | arXiv: 2411.01679
- automatic reward shaping from confounded offline data | arXiv: 2505.11478
- automl-agent a multi-agent llm framework for full-pipeline automl | arXiv: 2410.02958
- autonomy-of-experts models | arXiv: 2501.13074
- avoiding catastrophe in online learning by asking for help | arXiv: 2402.08062
- avoiding leakage poisoning concept interventions under distribution shifts | arXiv: 2504.17921
- b-score detecting biases in large language models using response history | arXiv: 2505.18545
- balanced learning for domain adaptive semantic segmentation | arXiv: 2512.06886
- balancing efficiency and expressiveness subgraph gnns with walk-based centrality | arXiv: 2501.03113
- banyan improved representation learning with explicit structure | arXiv: 2407.17771
- bayesian inference for correlated human experts and classifiers | arXiv: 2506.05636
- bayesian neural scaling law extrapolation with prior-data fitted networks | arXiv: 2505.23032
- beaver building environments with assessable variation for evaluating multi-obje | arXiv: 2507.07769
- became bayesian continual learning with adaptive model merging | arXiv: 2504.02666
- benchmarking quantum reinforcement learning | arXiv: 2501.15893
- benefits of early stopping in gradient descent for overparameterized logistic re | arXiv: 2502.13283
- benign overfitting in token selection of attention mechanism | arXiv: 2409.17625
- best subset selection optimal pursuit for feature selection and elimination | arXiv: 2501.16815
- best-route adaptive llm routing with test-time optimal compute | arXiv: 2506.22716
- beyond bradley-terry models a general preference model for language model alignm | arXiv: 2410.02197
- beyond communication overhead a multilevel monte carlo approach for mitigating c | arXiv: 2507.05508
- beyond cvar leveraging static spectral risk measures for enhanced decision-makin | arXiv: 2501.02087
- beyond entropy region confidence proxy for wild test-time adaptation | arXiv: 2505.20704
- beyond induction heads in-context meta learning induces multi-phase circuit emer | arXiv: 2505.16694
- beyond message passing neural graph pattern machine | arXiv: 2501.18739
- beyond one-hot labels semantic mixing for model calibration | arXiv: 2504.13548
- beyond self-repellent kernels history-driven target towards efficient nonlinear | arXiv: 2505.18300
- beyond sensor data foundation models of behavioral data from wearables improve h | arXiv: 2507.00191
- beyond the rainbow high performance deep reinforcement learning on a desktop pc | arXiv: 2411.03820
- beyond zero initialization investigating the impact of non-zero initialization o | arXiv: 2505.23194
- biassemble learning collaborative affordance for bimanual geometric assembly | arXiv: 2506.06221
- binary hypothesis testing for softmax models and leverage score models | arXiv: 2405.06003
- binauralflow a causal and streamable approach for high-quality binaural speech s | arXiv: 2505.22865
- bipartite ranking from multiple labels on loss versus label aggregation | arXiv: 2504.11284
- blockdialect block-wise fine-grained mixed format quantization for energy-effici | arXiv: 2501.01144
- blueglass a framework for composite ai safety | arXiv: 2507.10106
- boa attention-aware post-training quantization without backpropagation | arXiv: 2406.13474
- boosting masked ecg-text auto-encoders as discriminative learners | arXiv: 2410.02131
- bopo neural combinatorial optimization via best-anchored and objective-guided pr | arXiv: 2503.07580
- bounded rationality for llms satisficing alignment at inference-time | arXiv: 2505.23729
- breaking silos adaptive model fusion unlocks better time series forecasting | arXiv: 2505.18442
- breaking the n15 additive error barrier for private and efficient graph sparsifi | arXiv: 2507.01873
- bridge bootstrapping text to control time-series generation via multi-agent iter | arXiv: 2503.02445
- bridging the language gap synthetic voice diversity via latent mixup for equitab | arXiv: 2511.20534
- bring reason to vision understanding perception and reasoning through model merg | arXiv: 2505.05464
- brite bootstrapping reinforced thinking process to enhance language model reason | arXiv: 2501.18858
- broadband ground motion synthesis by diffusion model with minimal condition | arXiv: 2412.17333
- build agent advocates not platform agents | arXiv: 2505.04345
- ca2-vdm efficient autoregressive video diffusion model with causal generation an | arXiv: 2411.16375
- can one safety loop guard them all agentic guard rails for federated computing | arXiv: 2506.20000
- can rlhf be more efficient with imperfect reward models a policy coverage perspe | arXiv: 2502.19255
- can transformers learn full bayesian inference in context | arXiv: 2501.16825
- cape context-aware prompt perturbation mechanism with differential privacy | arXiv: 2505.05922
- cascade token-sharded private llm inference | arXiv: 2507.05228
- causal abstraction inference under lossy representations | arXiv: 2509.21607
- causal discovery from conditionally stationary time series | arXiv: 2110.06257
- causal discovery of latent variables in galactic archaeology | arXiv: 2507.00134
- causal effect identification in lvlingam from higher-order cumulants | arXiv: 2506.05202
- causal evidence for the primordiality of colors in trans-neptunian objects | arXiv: 2507.03760
- causal foundation models disentangling physics from instrument properties | arXiv: 2507.05333
- causal-pik causality-based physical reasoning with a physics-informed kernel | arXiv: 2505.22861
- causality-aware contrastive learning for robust multivariate time-series anomaly | arXiv: 2506.03964
- certification for differentially private prediction in gradient-based training | arXiv: 2406.13433
- cfp-gen combinatorial functional protein generation via diffusion language model | arXiv: 2505.22869
- challenges and future directions of data-centric ai alignment | arXiv: 2410.01957
- chameleon a flexible data-mixing framework for language model pretraining and fi | arXiv: 2505.24844
- channel normalization for time series channel identification | arXiv: 2506.00432
- clarify contrastive preference reinforcement learning for untangling ambiguous q | arXiv: 2506.00388
- classifier reconstruction through counterfactual-aware wasserstein prototypes | arXiv: 2512.10878
- clients collaborate flexible differentially private federated learning with guar | arXiv: 2402.07002
- clipping improves adam-norm and adagrad-norm when the noise is heavy-tailed | arXiv: 2406.04443
- closed-form solutions a new perspective on solving differential equations | arXiv: 2405.14620
- closed-loop long-horizon robotic planning via equilibrium sequence modeling | arXiv: 2410.01440
- clustering properties of self-supervised learning | arXiv: 2501.18452
- cocoa-mix confusion-and-confidence-aware mixture model for context optimization | arXiv: 2506.07484
- cody counterfactual explainers for dynamic graphs | arXiv: 2403.16846
- collaborative mean estimation among heterogeneous strategic agents individual ra | arXiv: 2407.15881
- collapse-proof non-contrastive self-supervised learning | arXiv: 2410.04959
- come together but not right now a progressive strategy to boost low-rank adaptat | arXiv: 2506.05713
- comemo lvlms need image context with image memory | arXiv: 2506.06279
- communicating activations between language model agents | arXiv: 2501.14082
- commvq commutative vector quantization for kv cache compression | arXiv: 2506.18879
- compact matrix quantum group equivariant neural networks | arXiv: 2311.06358
- compelling relu networks to exhibit exponentially many linear regions at initial | arXiv: 2311.18022
- compositional flows for 3d molecule and synthesis pathway co-design | arXiv: 2504.08051
- compositional scene understanding through inverse generative modeling | arXiv: 2505.21780
- comrecgc global graph counterfactual explainer through common recourse | arXiv: 2505.07081
- concept-based unsupervised domain adaptation | arXiv: 2505.05195
- conceptual belief-informed reinforcement learning | arXiv: 2410.01739
- configurable preference tuning with rubric-guided synthetic data | arXiv: 2506.11702
- conformal prediction as bayesian quadrature | arXiv: 2502.13228
- confpo exploiting policy model confidence for critical token selection in prefer | arXiv: 2506.08712
- connecting thompson sampling and ucb towards more efficient trade-offs between p | arXiv: 2505.02383
- consistency in language models current landscape challenges and future direction | arXiv: 2505.00268
- constant stepsize local gd for logistic regression acceleration by instability | arXiv: 2506.13974
- constrained hamiltonian systems on observation-induced fiber bundles theory of s | arXiv: 2505.22824
- context driving in-context learning for text removal and segmentation | arXiv: 2506.03799
- context is key a benchmark for forecasting with essential textual information | arXiv: 2410.18959
- context matters query-aware dynamic long sequence modeling of gigapixel images | arXiv: 2501.18984
- context tuning for in-context optimization | arXiv: 2507.04221
- contextures representations from contexts | arXiv: 2505.01557
- continual reinforcement learning by planning with online world models | arXiv: 2507.09177
- continualflow learning and unlearning with neural flow matching | arXiv: 2506.18747
- continuous semi-implicit models | arXiv: 2506.06778
- continuous visual autoregressive generation via score maximization | arXiv: 2505.07812
- continuous-time analysis of heavy ball momentum in min-max games | arXiv: 2505.19537
- controlling underestimation bias in constrained reinforcement learning for safe | arXiv: 2601.11953
- convex markov games a new frontier for multi-agent reinforcement learning | arXiv: 2410.16600
- cooperation of experts fusing heterogeneous information with large margin | arXiv: 2505.20853
- core context aware transformers for long context language modeling | arXiv: 2412.12465
- core knowledge deficits in multi-modal language models | arXiv: 2410.10855
- corematching a co-adaptive sparse inference framework with token and neuron prun | arXiv: 2505.19235
- correlated errors in large language models | arXiv: 2506.07962
- costfilter-ad enhancing anomaly detection through matching cost filtering | arXiv: 2505.01476
- counterfactual effect decomposition in multi-agent sequential decision making | arXiv: 2410.12539
- counting in small transformers the delicate interplay between attention and feed | arXiv: 2407.11542
- cover learning for large-scale topology representation | arXiv: 2503.09767
- craftium bridging flexibility and efficiency for rich 3d single- and multi-agent | arXiv: 2407.03969
- cross-environment cooperation enables zero-shot multi-agent coordination | arXiv: 2504.12714
- cross-regularization adaptive model complexity through validation gradients | arXiv: 2506.19755
- crow eliminating backdoors from large language models via internal consistency r | arXiv: 2411.12768
- curse of high dimensionality issue in transformer for long-context modeling | arXiv: 2505.22107
- curvature enhanced data augmentation for regression | arXiv: 2506.06853
- customizing the inductive biases of softmax attention using structured matrices | arXiv: 2509.07963
- cut out and replay a simple yet versatile strategy for multi-label online contin | arXiv: 2505.19680
- d-fusion direct preference optimization for aligning diffusion models with visua | arXiv: 2505.22002
- data-juicer sandbox a feedback-driven suite for multimodal data-model co-develop | arXiv: 2407.11784
- datadecide how to predict best pretraining data with small experiments | arXiv: 2504.11393
- dctdiff intriguing properties of image generative modeling in the dct space | arXiv: 2412.15032
- de-antifake rethinking the protective perturbations against voice cloning attack | arXiv: 2507.02606
- de-mark watermark removal in large language models | arXiv: 2410.13808
- decision making under the exponential family distributionally robust optimisatio | arXiv: 2411.16829
- decoding rewards in competitive games inverse game theory with entropy regulariz | arXiv: 2601.12707
- deep electromagnetic structure design under limited evaluation budgets | arXiv: 2506.19384
- deep learning is not so mysterious or different | arXiv: 2503.02113
- deepseq high-throughput single-cell rna sequencing data labeling via web search- | arXiv: 2506.13817
- defame dynamic evidence-based fact-checking with multimodal experts | arXiv: 2412.10510
- defending lvlms against vision attacks through partial-perception supervision | arXiv: 2412.12722
- deltashap explaining prediction evolutions in online patient monitoring with sha | arXiv: 2507.02342
- democratic ai is possible the democracy levels framework shows how it might work | arXiv: 2411.09222
- demystifying the paradox of importance sampling with an estimated history-depend | arXiv: 2505.22492
- density ratio estimation-based bayesian optimization with semi-supervised learni | arXiv: 2305.15612
- deprecating benchmarks criteria and framework | arXiv: 2507.06434
- designing cyclic peptides via harmonic sde with atom-bond modeling | arXiv: 2505.21452
- differentiable stellar atmospheres with physics-informed neural networks | arXiv: 2507.06357
- diffuse everything multimodal diffusion models on arbitrary state spaces | arXiv: 2506.07903
- diffusion adversarial post-training for one-step video generation | arXiv: 2501.08316
- diffusion sampling correction via approximately 10 parameters | arXiv: 2411.06503
- diffusion-vla generalizable and interpretable robot foundation model via self-ge | arXiv: 2412.03293
- dilqr differentiable iterative linear quadratic regulator via implicit different | arXiv: 2506.17473
- dipllm fine-tuning llm for strategic decision-making in diplomacy | arXiv: 2506.09655
- direct discriminative optimization your likelihood-based visual generative model | arXiv: 2503.01103
- directed graph grammars for sequence-based learning | arXiv: 2505.22949
- discovering global false negatives on the fly for self-supervised contrastive le | arXiv: 2502.20612
- discrepancy minimization in input-sparsity time | arXiv: 2210.12468
- discrete neural algorithmic reasoning | arXiv: 2402.11628
- discriminative policy optimization for token-level reward models | arXiv: 2505.23363
- disentangling and integrating relational and sensory information in transformer | arXiv: 2405.16727
- disparate conditional prediction in multiclass classifiers | arXiv: 2206.03234
- diss-l-ect dissecting graph data with local euler characteristic transforms | arXiv: 2410.02622
- distillation of discrete diffusion through dimensional correlations | arXiv: 2410.08709
- distilling tool knowledge into language models via back-translated traces | arXiv: 2506.19171
- distributed and decentralised training technical governance challenges in a shif | arXiv: 2507.07765
- diverging preferences when do annotators disagree and do models know | arXiv: 2410.14632
- diverse prototypical ensembles improve robustness to subpopulation shift | arXiv: 2505.23027
- diversity by design leveraging distribution matching for offline model-based opt | arXiv: 2501.18768
- divide and conquer grounding llms as efficient decision-making agents via offlin | arXiv: 2505.19761
- diving into self-evolving training for multimodal reasoning | arXiv: 2412.17451
- dlp dynamic layerwise pruning in large language models | arXiv: 2505.23807
- do multiple instance learning models transfer | arXiv: 2506.09022
- do not mimic my voice speaker identity unlearning for zero-shot text-to-speech | arXiv: 2507.20140
- do sparse autoencoders generalize a case study of answerability | arXiv: 2502.19964
- do vision-language models really understand visual language | arXiv: 2410.00193
- does data scaling lead to visual compositional generalization | arXiv: 2507.07102
- does graph prompt work a data operation perspective with theoretical analysis | arXiv: 2410.01635
- dont be so negative score-based generative modeling with oracle-assisted guidanc | arXiv: 2307.16463
- dont lag rag training-free adversarial detection using rag | arXiv: 2504.04858
- doubly protected estimation for survival outcomes utilizing external controls fo | arXiv: 2410.18409
- doubly robust fusion of many treatments for policy learning | arXiv: 2505.08092
- dpo meets ppo reinforced token optimization for rlhf | arXiv: 2404.18922
- drag data reconstruction attack using guided diffusion | arXiv: 2509.11724
- dragon guard llm unlearning in context via negative detection and reasoning | arXiv: 2511.05784
- drivegpt scaling autoregressive behavior models for driving | arXiv: 2412.14415
- dsp dynamic sequence parallelism for multi-dimensional transformers | arXiv: 2403.10266
- dssd efficient edge-device llm deployment and collaborative inference via distri | arXiv: 2507.12000
- dual form complementary masking for domain-adaptive image segmentation | arXiv: 2507.12008
- dynamic benchmarking of reasoning capabilities in code large language models und | arXiv: 2503.04149
- dynamic mixture of curriculum lora experts for continual multimodal instruction | arXiv: 2506.11672
- dynamical phases of short-term memory mechanisms in rnns | arXiv: 2502.17433
- e-lda toward interpretable lda topic models with strong guarantees in logarithmi | arXiv: 2506.07747
- easyinv toward fast and better ddim inversion | arXiv: 2408.05159
- eccdnamamba a pre-trained model for ultra-long eccdna sequence analysis | arXiv: 2506.18940
- editable noise map inversion encoding target-image into noise for high-fidelity | arXiv: 2509.25776
- eeg-language pretraining for highly label-efficient clinical phenotyping | arXiv: 2409.07480
- efficient and robust semantic image communication via stable cascade | arXiv: 2507.17416
- efficient curvature-aware hypergradient approximation for bilevel optimization | arXiv: 2505.02101
- efficient diffusion models for symmetric manifolds | arXiv: 2505.21640
- efficient generative modeling with residual vector quantization-based tokens | arXiv: 2412.10208
- efficient length-generalizable attention via causal retrieval for long-context l | arXiv: 2410.01651
- efficient logit-based knowledge distillation of deep spiking neural networks for | arXiv: 2501.15925
- efficient molecular conformer generation with so3-averaged flow matching and ref | arXiv: 2507.09785
- efficient network automatic relevance determination | arXiv: 2506.12352
- efficient noise calculation in deep learning-based mri reconstructions | arXiv: 2505.02007
- efficient optimization with orthogonality constraint a randomized riemannian sub | arXiv: 2505.12378
- efficient quantification of multimodal interaction at sample level | arXiv: 2506.17248
- efficient robotic policy learning via latent space backward planning | arXiv: 2505.06861
- efficoder enhancing code generation in large language models through efficiency- | arXiv: 2410.10209
- egoprivacy what your first-person camera says about you | arXiv: 2506.12258
- eigenspectrum analysis of neural networks without aspect ratio bias | arXiv: 2506.06280
- elemental interactive learning from demonstrations and vision-language models fo | arXiv: 2411.18825
- elmo efficiency via low-precision and peak memory optimization in large output s | arXiv: 2510.11168
- elucidating flow matching ode dynamics with respect to data geometries and denoi | arXiv: 2412.18730
- elucidating the design space of multimodal protein language models | arXiv: 2504.11454
- embedding safety into rl a new take on trust region methods | arXiv: 2411.02957
- emergence in non-neural models grokking modular arithmetic via average gradient | arXiv: 2407.20199
- emergent misalignment narrow finetuning can produce broadly misaligned llms | arXiv: 2502.17424
- emergent symbolic mechanisms support abstract reasoning in large language models | arXiv: 2502.20332
- empirical privacy variance | arXiv: 2503.12314
- empower structure-based molecule optimization with gradient guided bayesian flow | arXiv: 2411.13280
- enhancing certified robustness via block reflector orthogonal layers and logit a | arXiv: 2505.15174
- enhancing cooperative multi-agent reinforcement learning with state modelling an | arXiv: 2505.05262
- enhancing decision-making of large language models via actor-critic | arXiv: 2506.06376
- enhancing parallelism in decentralized stochastic convex optimization | arXiv: 2506.00961
- enhancing rating-based reinforcement learning to effectively leverage feedback f | arXiv: 2506.12822
- enhancing statistical validity and power in hybrid controlled trials a randomiza | arXiv: 2410.11713
- enhancing target-unspecific tasks through a features matrix | arXiv: 2505.03414
- enigma interactive tools substantially assist lm agents in finding security vuln | arXiv: 2409.16165
- epicoder encompassing diversity and complexity in code generation | arXiv: 2501.04694
- epsilon-vae denoising as visual decoding | arXiv: 2410.04081
- ergodic generative flows | arXiv: 2505.03561
- erwin a tree-based hierarchical transformer for large-scale physical systems | arXiv: 2502.17019
- estimating causal effects in gaussian linear scms with finite data | arXiv: 2601.04673
- etta elucidating the design space of text-to-audio models | arXiv: 2412.19351
- evaluating deepfake detectors in the wild | arXiv: 2507.21905
- evaluating judges as evaluators the jetts benchmark of llm-as-judges as test-tim | arXiv: 2504.15253
- evaluating llms across multi-cognitive levels from medical knowledge mastery to | arXiv: 2506.08349
- evaluating morphological alignment of tokenizers in 70 languages | arXiv: 2507.06378
- evaluating neuron explanations a unified framework with sanity checks | arXiv: 2506.05774
- evaluating retrieval-augmented generation agents for autonomous scientific disco | arXiv: 2507.07155
- event-aware sentiment factors from llm-augmented financial tweets a transparent | arXiv: 2508.07408
- evolve evaluating and optimizing llms for in-context exploration | arXiv: 2410.06238
- evolving prompts in-context an open-ended self-replicating perspective | arXiv: 2506.17930
- evomesh adaptive physical simulation with hierarchical graph evolutions | arXiv: 2410.03779
- exlm rethinking the impact of mask tokens in masked language models | arXiv: 2501.13397
- exogenous isomorphism for counterfactual identifiability | arXiv: 2505.02212
- expert evaluation of llm world models a high-t c superconductivity case study | arXiv: 2511.03782
- explaining fast and slow abstraction and refinement of provable explanations | arXiv: 2506.08505
- exploiting similarity for computation and communication-efficient decentralized | arXiv: 2506.05791
- explora parameter-efficient extended pre-training to adapt vision transformers u | arXiv: 2406.10973
- exploring large action sets with hyperspherical embeddings using von mises-fishe | arXiv: 2507.00518
- exploring position encoding in diffusion u-net for training-free high-resolution | arXiv: 2503.09830
- expressive score-based priors for distribution matching with geometry-preserving | arXiv: 2506.14607
- extreme value policy optimization for safe reinforcement learning | arXiv: 2601.12008
- fast and robust task sampling with posterior and diversity synergies for adaptiv | arXiv: 2504.19139
- fastcav efficient computation of concept activation vectors for explaining deep | arXiv: 2505.17883
- faster and stronger when ann-snn conversion meets parallel spiking calculation | arXiv: 2412.13610
- faster rates for private adversarial bandits | arXiv: 2505.21790
- featsharp your vision model features sharper | arXiv: 2502.16025
- feature learning beyond the lazy-rich dichotomy insights from representational g | arXiv: 2503.18114
- federated in-context learning iterative refinement for improved answer quality | arXiv: 2506.07440
- fedrag a framework for fine-tuning retrieval-augmented generation systems | arXiv: 2506.09200
- fedswa improving generalization in federated learning with highly heterogeneous | arXiv: 2507.20016
- fedtail federated long-tailed domain generalization with sharpness-guided gradie | arXiv: 2506.08518
- feedforward few-shot species range estimation | arXiv: 2502.14977
- ferret federated full-parameter tuning at scale for large language models | arXiv: 2409.06277
- few-shot learner generalizes across ai-generated image detection | arXiv: 2501.08763
- fg-clip fine-grained visual and textual alignment | arXiv: 2505.05071
- fgfp a fractional gaussian filter and pruning for deep neural networks compressi | arXiv: 2507.22527
- ficgcn unveiling the homomorphic encryption efficiency from irregular graph conv | arXiv: 2506.10399
- fine-grained captioning of long videos through scene graph consolidation | arXiv: 2502.16427
- finetuning stellar spectra foundation models with lora | arXiv: 2507.20972
- fishers for free approximating the fisher information matrix by recycling the sq | arXiv: 2507.18807
- fixed-confidence multiple change point identification under bandit feedback | arXiv: 2507.08994
- fixing the loose brake exponential-tailed stopping time in best arm identificati | arXiv: 2411.01808
- flam frame-wise language-audio modeling | arXiv: 2505.05335
- flat-lora low-rank adaptation over a flat loss landscape | arXiv: 2409.14396
- flatquant flatness matters for llm quantization | arXiv: 2410.09426
- fleet of agents coordinated problem solving with large language models | arXiv: 2405.06691
- flexibility-conditioned protein structure design with flow matching | arXiv: 2508.18211
- flexible tails for normalizing flows | arXiv: 2406.16971
- flexiclip locality-preserving free-form character animation | arXiv: 2501.08676
- flextok resampling images into 1d token sequences of flexible length | arXiv: 2502.13967
- floe on-the-fly moe inference on memory-constrained gpu | arXiv: 2505.05950
- flow of reasoning training llms for divergent reasoning with minimal examples | arXiv: 2406.05673
- flowdrag 3d-aware drag-based image editing with mesh-guided deformation vector f | arXiv: 2507.08285
- fmc formalization of natural language mathematical competition problems | arXiv: 2507.11275
- foundation model insights and a multi-model approach for superior fine-grained o | arXiv: 2506.14473
- foundation models for clinical records at health system scale | arXiv: 2507.00574
- foundation molecular grammar multi-modal foundation models induce interpretable | arXiv: 2505.22948
- founder grounding foundation models in world models for open-ended embodied deci | arXiv: 2507.12496
- fourier position embedding enhancing attentions periodic extension for length ge | arXiv: 2412.17739
- freemesh boosting mesh generation with coordinates merging | arXiv: 2505.13573
- from black boxes to transparent minds evaluating and enhancing the theory of min | arXiv: 2506.14224
- from debate to equilibrium belief-driven multi-agent llm reasoning via bayesian | arXiv: 2506.08292
- from language models over tokens to language models over characters | arXiv: 2412.03719
- from logits to hierarchies hierarchical clustering made simple | arXiv: 2410.07858
- from low rank gradient subspace stabilization to low-rank weights observations t | arXiv: 2407.11239
- from passive to active reasoning can large language models ask the right questio | arXiv: 2506.08295
- from rag to memory non-parametric continual learning for large language models | arXiv: 2502.14802
- from token to rhythm a multi-scale approach for ecg-language pretraining | arXiv: 2506.21803
- fsl-sage accelerating federated split learning via smashed activation gradient e | arXiv: 2505.23182
- fully dynamic euclidean bi-chromatic matching in sublinear update time | arXiv: 2505.09010
- fully heteroscedastic count regression with deep double poisson networks | arXiv: 2406.09262
- function encoders a principled approach to transfer learning in hilbert spaces | arXiv: 2501.18373
- function-space learning rates | arXiv: 2502.17405
- function-to-style guidance of llms for code translation | arXiv: 2507.11083
- g-sim generative simulations with large language models and gradient-free calibr | arXiv: 2506.09272
- gaprompt geometry-aware point cloud prompt for 3d vision model | arXiv: 2505.04119
- gaussian mixture flow matching models | arXiv: 2504.05304
- gaussmarker robust dual-domain watermark for diffusion models | arXiv: 2506.11444
- gcal adapting graph models to evolving domain shifts | arXiv: 2505.16860
- general agents contain world models | arXiv: 2506.01622
- generalization analysis for supervised contrastive representation learning under | arXiv: 2505.04937
- generalization and robustness of the tilted empirical risk | arXiv: 2409.19431
- generalization bounds via meta-learned model representations pac-bayes and sampl | arXiv: 2410.13577
- generalization in federated learning a conditional mutual information framework | arXiv: 2503.04091
- generalized interpolating discrete diffusion | arXiv: 2503.04482
- generation from noisy examples | arXiv: 2501.04179
- generative audio language modeling with continuous-valued tokens and masked next | arXiv: 2507.09834
- generative social choice the next generation | arXiv: 2505.22939
- genmol a drug discovery generalist with discrete diffusion | arXiv: 2501.06158
- geometric contact flows contactomorphisms for dynamics and control | arXiv: 2506.17868
- geometric generative modeling with noise-conditioned graph networks | arXiv: 2507.09391
- geometric representation condition improves equivariant molecule generation | arXiv: 2410.03655
- geometry-to-image synthesis-driven generative point cloud registration | arXiv: 2512.09407
- glgenn a novel parameter-light equivariant neural networks architecture based on | arXiv: 2506.09625
- global context-aware representation learning for spatially resolved transcriptom | arXiv: 2506.15698
- global convergence and rich feature learning in l-layer infinite-width neural ne | arXiv: 2503.09565
- goirl graph-oriented inverse reinforcement learning for multimodal trajectory pr | arXiv: 2506.21121
- gptaq efficient finetuning-free quantization for asymmetric calibration | arXiv: 2504.02692
- gpu-friendly and linearly convergent first-order methods for certifying optimal | arXiv: 2603.01306
- gradient aligned regression via pairwise losses | arXiv: 2402.06104
- gradual transition from bellman optimality operator to bellman operator in onlin | arXiv: 2506.05968
- gram a generative foundation reward model for reward generalization | arXiv: 2506.14175
- graph attention is not always beneficial a theoretical analysis of graph attenti | arXiv: 2412.15496
- graph generative pre-trained transformer | arXiv: 2501.01073
- graph-assisted stitching for offline hierarchical reinforcement learning | arXiv: 2506.07744
- graph-constrained reasoning faithful reasoning on knowledge graphs with large la | arXiv: 2410.13080
- graph-supported dynamic algorithm configuration for multi-objective combinatoria | arXiv: 2505.16471
- graph4mm weaving multimodal learning with structural information | arXiv: 2510.16990
- gravity-bench-v1 a benchmark on gravitational physics discovery for agents | arXiv: 2501.18411
- griffin towards a graph-centric relational database foundation model | arXiv: 2505.05568
- grokformer graph fourier kolmogorov-arnold transformers | arXiv: 2411.17296
- grokking at the edge of linear separability | arXiv: 2410.04489
- guardagent safeguard llm agents by a guard agent via knowledge-enabled reasoning | arXiv: 2406.09187
- guidedquant large language model quantization via exploiting end loss guidance | arXiv: 2505.07004
- gumiho a hybrid architecture to prioritize early tokens in speculative decoding | arXiv: 2503.10135
- handling imbalanced pseudolabels for vision-language models with concept alignme | arXiv: 2505.02056
- harmonica harmonizing training and inference for better feature caching in diffu | arXiv: 2410.01723
- heavy-tailed linear bandits huber regression with one-pass update | arXiv: 2503.00419
- hessian geometry of latent space in generative models | arXiv: 2506.10632
- heterogeneous data game characterizing the model competition across multiple dat | arXiv: 2505.07688
- hgot self-supervised heterogeneous graph neural network with optimal transport | arXiv: 2506.02619
- hi robot open-ended instruction following with hierarchical vision-language-acti | arXiv: 2502.19417
- hierarchical and collaborative llm-based control for multi-uav motion and commun | arXiv: 2506.06532
- hierarchical masked autoregressive models with low-resolution token pivots | arXiv: 2505.20288
- hierarchical refinement optimal transport to infinity and beyond | arXiv: 2503.03025
- hierarchical reinforcement learning with targeted causal interventions | arXiv: 2507.04373
- hierarchical reinforcement learning with uncertainty-guided diffusional subgoals | arXiv: 2505.21750
- high dynamic range novel view synthesis with single exposure | arXiv: 2505.01212
- high-resolution live fuel moisture content lfmc maps for wildfire risk from mult | arXiv: 2506.20132
- how do transformers learn variable binding in symbolic programs | arXiv: 2505.20896
- how far is video generation from world model a physical law perspective | arXiv: 2411.02385
- how much can we forget about data contamination | arXiv: 2410.03249
- how to move your dragon text-to-motion synthesis for large-vocabulary objects | arXiv: 2503.04257
- how to set adamws weight decay as you scale model and dataset size | arXiv: 2405.13698
- how to synthesize text data without model collapse | arXiv: 2412.14689
- how transformers learn regular language recognition a theoretical study on train | arXiv: 2505.00926
- hybrid quantum-classical multi-agent pathfinding | arXiv: 2501.14568
- hyperband-based bayesian optimization for black-box prompt selection | arXiv: 2412.07820
- hyperbolic-pde gnn spectral graph neural networks in the perspective of a system | arXiv: 2505.23014
- hyperimts hypergraph neural network for irregular multivariate time series forec | arXiv: 2505.17431
- i2moe interpretable multimodal interaction-aware mixture-of-experts | arXiv: 2505.19190
- iclshield exploring and mitigating in-context learning backdoor attacks | arXiv: 2507.01321
- identifying and understanding cross-class features in adversarial training | arXiv: 2506.05032
- idpa instance decoupled prompt attention for incremental medical object detectio | arXiv: 2506.00406
- if open source is to win it must go public | arXiv: 2507.09296
- impact iterative mask-based parallel decoding for text-to-audio generation with | arXiv: 2506.00736
- implementing adaptations for vision autoregressive model | arXiv: 2507.11441
- importance corrected neural jko sampling | arXiv: 2407.20444
- importance sampling for nonlinear models | arXiv: 2505.12353
- improved and oracle-efficient online ell 1-multicalibration | arXiv: 2505.17365
- improved exploration in gflownets via enhanced epistemic neural networks | arXiv: 2506.16313
- improved generalization bounds for transductive learning by transductive local c | arXiv: 2309.16858
- improved last-iterate convergence of shuffling gradient methods for nonsmooth co | arXiv: 2505.23056
- improved learning via k-dtw a novel dissimilarity measure for curves | arXiv: 2505.23431
- improved off-policy reinforcement learning in biological sequence design | arXiv: 2410.04461
- improved sample complexity for private nonsmooth nonconvex optimization | arXiv: 2410.05880
- improving continual learning performance and efficiency with auxiliary classifie | arXiv: 2403.07404
- improving flow matching by aligning flow divergence | arXiv: 2602.00869
- improving generalization with flat hilbert bayesian inference | arXiv: 2410.04196
- improving llm agent planning with in-context learning via atomic fact augmentati | arXiv: 2506.09171
- improving llm safety alignment with dual-objective optimization | arXiv: 2503.03710
- improving memory efficiency for training kans via meta learning | arXiv: 2506.07549
- improving model alignment through collective intelligence of open-source llms | arXiv: 2505.03059
- improving rationality in the reasoning process of language models through self-p | arXiv: 2506.22920
- improving the diffusability of autoencoders | arXiv: 2502.14831
- improving the effective receptive field of message-passing neural networks | arXiv: 2505.23185
- improving the variance of differentially private randomized experiments through | arXiv: 2308.00957
- improving your model ranking on chatbot arena by vote rigging | arXiv: 2501.17858
- imts is worth time times channel patches visual masked autoencoders for irregula | arXiv: 2505.22815
- in-context adaptation to concept drift for learned database operations | arXiv: 2505.04404
- in-context linear regression demystified training dynamics and mechanistic inter | arXiv: 2503.12734
- incremental gradient descent with small epoch counts is surprisingly slow on ill | arXiv: 2506.04126
- inductive gradient adjustment for spectral bias in implicit neural representatio | arXiv: 2410.13271
- inference-time decomposition of activations itda a scalable approach to interpre | arXiv: 2505.17769
- infocons identifying interpretable critical concepts in point clouds via informa | arXiv: 2505.19820
- infosam fine-tuning the segment anything model from an information-theoretic per | arXiv: 2505.21920
- infosem a deep generative model with informative priors for gene regulatory netw | arXiv: 2503.04483
- instruction tuning of large language models for tabular data generation-in one d | arXiv: 2511.23220
- instruction-following pruning for large language models | arXiv: 2501.02086
- integer programming for generalized causal bootstrap designs | arXiv: 2410.21464
- integrating intermediate layer optimization and projected gradient descent for s | arXiv: 2505.20789
- interchangeable token embeddings for extendable vocabulary and alpha-equivalence | arXiv: 2410.17161
- interior-point vanishing problem in semidefinite relaxations for neural network | arXiv: 2506.10269
- internal causal mechanisms robustly predict language model out-of-distribution b | arXiv: 2505.11770
- intlora integral low-rank adaptation of quantized diffusion models | arXiv: 2410.21759
- invariance makes llm unlearning resilient even to unanticipated downstream fine- | arXiv: 2506.01339
- investigating non-transitivity in llm-as-a-judge | arXiv: 2502.14074
- is complex query answering really complex | arXiv: 2410.12537
- is your llm-based multi-agent a reliable real-world planner exploring fraud dete | arXiv: 2505.16557
- is your model fairly certain uncertainty-aware fairness evaluation for llms | arXiv: 2505.23996
- isolated causal effects of natural language | arXiv: 2410.14812
- it3 idempotent test-time training | arXiv: 2410.04201
- joker joint optimization framework for lightweight kernel machines | arXiv: 2505.17765
- k2ie kernel method-based kernel intensity estimators for inhomogeneous poisson p | arXiv: 2505.24704
- kan-ad time series anomaly detection with kolmogorov-arnold networks | arXiv: 2411.00278
- kbqa-o1 agentic knowledge base question answering with monte carlo tree search | arXiv: 2501.18922
- kea keeping exploration alive by proactively coordinating exploration strategies | arXiv: 2503.18234
- kelps a framework for verified multi-language autoformalization via semantic-syn | arXiv: 2507.08665
- kernel-based unsupervised embedding alignment for enhanced visual representation | arXiv: 2506.02557
- kinetic langevin diffusion for crystalline materials generation | arXiv: 2507.03602
- la rosa enhancing llm efficiency via layerwise rotated sparse activation | arXiv: 2507.01299
- label-efficient hyperspectral image classification via spectral film modulation | arXiv: 2512.03430
- lacache ladder-shaped kv caching for efficient long-context modeling of large la | arXiv: 2507.14204
- lada scalable label-specific clip adapter for continual learning | arXiv: 2505.23271
- ladder-residual parallelism-aware architecture for accelerating large model infe | arXiv: 2501.06589
- laion-c an out-of-distribution benchmark for web-scale vision models | arXiv: 2506.16950
- langdaug langevin data augmentation for multi-source domain generalization in me | arXiv: 2505.19659
- language model developers should report train-test overlap | arXiv: 2410.08385
- language models over canonical byte-pair encodings | arXiv: 2506.07956
- lapsum -- one method to differentiate them all ranking sorting and top-k selecti | arXiv: 2503.06242
- large language model llm-enabled in-context learning for wireless network optimi | arXiv: 2408.00214
- large language models are demonstration pre-selectors for themselves | arXiv: 2506.06033
- large language models to diffusion finetuning | arXiv: 2501.15781
- laser attention with exponential transformation | arXiv: 2411.03493
- latent imputation before prediction a new computational paradigm for de novo pep | arXiv: 2505.17524
- latent variable causal discovery under selection bias | arXiv: 2512.11219
- latent variable estimation in bayesian black-litterman models | arXiv: 2505.02185
- layer-wise alignment examining safety alignment across image encoder layers in v | arXiv: 2411.04291
- layer-wise quantization for quantized optimistic dual averaging | arXiv: 2505.14371
- ldmol a text-to-molecule diffusion model with structurally informative latent sp | arXiv: 2405.17829
- learnable spatial-temporal positional encoding for link prediction | arXiv: 2506.08309
- learning cascade ranking as one network | arXiv: 2503.09492
- learning distances from data with normalizing flows and score matching | arXiv: 2407.09297
- learning distribution-wise control in representation space for language models | arXiv: 2506.06686
- learning dynamics under environmental constraints via measurement-induced bundle | arXiv: 2505.19521
- learning invariant causal mechanism from vision-language models | arXiv: 2405.15289
- learning mean field control on sparse graphs | arXiv: 2501.17079
- learning mixtures of experts with em a mirror descent perspective | arXiv: 2411.06056
- learning optimal multimodal information bottleneck representations | arXiv: 2505.19996
- learning progress driven multi-agent curriculum | arXiv: 2205.10016
- learning safe strategies for value maximizing buyers in uniform price auctions | arXiv: 2406.03674
- learning safety constraints for large language models | arXiv: 2505.24445
- learning single index models with diffusion priors | arXiv: 2505.21135
- learning soft sparse shapes for efficient time-series classification | arXiv: 2505.06892
- learning survival distributions with the asymmetric laplace distribution | arXiv: 2505.03712
- learning time-aware causal representation for model generalization in evolving d | arXiv: 2506.17718
- learning to incentivize in repeated principal-agent problems with adversarial ag | arXiv: 2505.23124
- learning to plan reason for evaluation with thinking-llm-as-a-judge | arXiv: 2501.18099
- learning to stop deep learning for mean field optimal stopping | arXiv: 2410.08850
- learning to trust bellman updates selective state-adaptive regularization for of | arXiv: 2505.19923
- learning utilities from demonstrations in markov decision processes | arXiv: 2409.17355
- learning-augmented algorithms for mts with bandit access to multiple predictors | arXiv: 2506.05479
- learning-augmented hierarchical clustering | arXiv: 2506.05495
- lego sketch a scalable memory-augmented neural network for sketching data stream | arXiv: 2505.19561
- lemon label error detection using multimodal neighbors | arXiv: 2407.18941
- leveraging online olympiad-level math problems for llms training and contaminati | arXiv: 2501.14275
- leveraging partial smiles validation scheme for enhanced drug design in reinforc | arXiv: 2505.00530
- leveraging predictive equivalence in decision trees | arXiv: 2506.14143
- leveraging skills from unlabeled prior data for efficient online exploration | arXiv: 2410.18076
- lift the veil for the truth principal weights emerge after rank reduction for re | arXiv: 2506.00772
- liger linearizing large language models to gated recurrent structures | arXiv: 2503.01496
- lightgts a lightweight general time series forecasting model | arXiv: 2506.06005
- lighthouse fast and precise distance to shoreline calculations from anywhere on | arXiv: 2506.18842
- lightspeed geometric dataset distance via sliced optimal transport | arXiv: 2501.18901
- lineflow a framework to learn active control of production lines | arXiv: 2505.06744
- livs a pluralistic alignment dataset for inclusive public spaces | arXiv: 2503.01894
- llava-reid selective multi-image questioner for interactive person re-identifica | arXiv: 2504.10174
- llavaguard an open vlm-based framework for safeguarding vision datasets and mode | arXiv: 2406.05113
- llm data selection and utilization via dynamic bi-level optimization | arXiv: 2507.16178
- llm enhancers for gnns an analysis from the perspective of causal mechanism iden | arXiv: 2505.08265
- llm social simulations are a promising research method | arXiv: 2504.02234
- llm-srbench a new benchmark for scientific equation discovery with large languag | arXiv: 2504.10415
- local manifold approximation and projection for manifold-aware diffusion plannin | arXiv: 2506.00867
- localizing and mitigating memorization in image autoregressive models | arXiv: 2509.00488
- log-sum-exponential estimator for off-policy evaluation and learning | arXiv: 2506.06873
- long-form speech generation with spoken language models | arXiv: 2412.18603
- long-short alignment for effective long-context modeling in llms | arXiv: 2506.11769
- look twice before you answer memory-space visual retracing for hallucination mit | arXiv: 2410.03577
- lora fine-tuning without gpus a cpu-efficient meta-generation framework for llms | arXiv: 2507.01806
- lscd lomb-scargle conditioned diffusion for time series imputation | arXiv: 2506.17039
- lyapunov learning at the onset of chaos | arXiv: 2506.12810
- m3-jepa multimodal alignment via multi-gate moe based on the joint-embedding pre | arXiv: 2409.05929
- m3hf multi-agent reinforcement learning from multi-phase human feedback of mixed | arXiv: 2503.02077
- machine learning from explanations | arXiv: 2507.04788
- machines and mathematical mutations using gnns to characterize quiver mutation c | arXiv: 2411.07467
- make lora great again boosting lora with adaptive singular values and mixture-of | arXiv: 2502.16894
- mapeval a map-based evaluation of geo-spatial reasoning in foundation models | arXiv: 2501.00316
- marge improving math reasoning for llms with guided exploration | arXiv: 2505.12500
- mastering massive multi-task reinforcement learning via mixture-of-expert decisi | arXiv: 2505.24378
- mastering multiple-expert routing realizable h-consistency and strong guarantees | arXiv: 2506.20650
- maximal update parametrization and zero-shot hyperparameter transfer for fourier | arXiv: 2506.19396
- maximum coverage in turnstile streams with applications to fingerprinting measur | arXiv: 2504.18394
- maximum total correlation reinforcement learning | arXiv: 2505.16734
- medxpertqa benchmarking expert-level medical reasoning and understanding | arXiv: 2501.18362
- meek models shall inherit the earth | arXiv: 2507.07931
- merge-friendly post-training quantization for multi-target domain adaptation | arXiv: 2505.23651
- merit maximum-normalized element-wise ratio for language model large-batch train | arXiv: 2508.20577
- meta-black-box-optimization through offline q-function learning | arXiv: 2505.02010
- metaagent automatically constructing multi-agent systems based on finite state m | arXiv: 2507.22606
- metadata conditioning accelerates language model pre-training | arXiv: 2501.01956
- mf-lal drug compound generation using multi-fidelity latent space active learnin | arXiv: 2410.11226
- mib a mechanistic interpretability benchmark | arXiv: 2504.13151
- mimicmotion high-quality human motion video generation with confidence-aware pos | arXiv: 2406.19680
- mind the gap a practical attack on gguf quantization | arXiv: 2505.23786
- mitigating over-squashing in graph neural networks by spectrum-preserving sparsi | arXiv: 2506.16110
- mitigating plasticity loss in continual reinforcement learning by reducing churn | arXiv: 2506.00592
- mixed-curvature decision trees and random forests | arXiv: 2410.13879
- mixture of lookup experts | arXiv: 2503.15798
- mixture-of-expert variational autoencoders for cross-modality embedding of type | arXiv: 2507.16817
- mka memory-keyed attention for efficient long-context reasoning | arXiv: 2603.20586
- mmedpo aligning medical vision-language models with clinical-aware multimodal pr | arXiv: 2412.06141
- mminference accelerating pre-filling for long-context vlms via modality-aware pe | arXiv: 2504.16083
- moda modular duplex attention for multimodal perception cognition and emotion un | arXiv: 2507.04635
- model immunization from a condition number perspective | arXiv: 2505.23760
- model swarms collaborative search to adapt llm experts via swarm intelligence | arXiv: 2410.11163
- modeling all-atom glycan structures via hierarchical message passing and multi-s | arXiv: 2506.01376
- modeling user behavior from adaptive surveys with supplemental context | arXiv: 2507.20919
- modern methods in associative memory | arXiv: 2507.06211
- modified k-means algorithm with local optimality guarantees | arXiv: 2506.06990
- modulated diffusion accelerating generative modeling with modulated quantization | arXiv: 2506.22463
- moh multi-head attention as mixture-of-head attention | arXiv: 2410.11842
- moma modulating mamba for adapting image foundation models to video recognition | arXiv: 2506.23283
- moragent parameter efficient agent tuning with mixture-of-roles | arXiv: 2512.21708
- morphtok morphologically grounded tokenization for indian languages | arXiv: 2504.10335
- morse dual-sampling for lossless acceleration of diffusion models | arXiv: 2506.18251
- mpf aligning and debiasing language models post deployment via multi perspective | arXiv: 2507.02595
- mpo an efficient post-processing framework for mixing diverse preference alignme | arXiv: 2502.18699
- mtl-ue learning to learn nothing for multi-task learning | arXiv: 2505.05279
- multidimensional adaptive coefficient for inference trajectory optimization in f | arXiv: 2404.14161
- multiple-policy evaluation via density estimation | arXiv: 2404.00195
- multivariate conformal selection | arXiv: 2505.00917
- musecontrollite multifunctional music generation with lightweight conditioners | arXiv: 2506.18729
- near optimal best arm identification for clustered bandits | arXiv: 2505.10147
- near optimal decision trees in a split second | arXiv: 2502.15988
- near-optimal consistency-robustness trade-offs for learning-augmented online kna | arXiv: 2406.18752
- nearly optimal sample complexity for learning with label proportions | arXiv: 2505.05355
- negmerge sign-consensual weight merging for machine unlearning | arXiv: 2410.05583
- neighbour-driven gaussian process variational autoencoders for scalable structur | arXiv: 2505.16481
- network sparsity unlocks the scaling potential of deep reinforcement learning | arXiv: 2506.17204
- neural augmented kalman filters for road network assisted gnss positioning | arXiv: 2507.00654
- neural graph matching improves retrieval augmented generation in molecular machi | arXiv: 2502.17874
- neural stochastic differential equations on compact state spaces theory methods | arXiv: 2508.17090
- neurontune towards self-guided spurious bias mitigation | arXiv: 2505.24048
- neutral residues revisiting adapters for model extension | arXiv: 2410.02744
- new interaction paradigm for complex eda software leveraging gpt | arXiv: 2307.14740
- nextlong toward effective long-context training without long documents | arXiv: 2501.12766
- no soundness in the real world on the challenges of the verification of deployed | arXiv: 2506.01054
- non-stationary online learning for curved losses improved dynamic regret via mix | arXiv: 2506.10616
- nonparametric identification of latent concepts | arXiv: 2510.00136
- nonparametric modern hopfield models | arXiv: 2404.03900
- nonparametric teaching for graph property learners | arXiv: 2505.14170
- normalizing flows are capable generative models | arXiv: 2412.06329
- not all explanations for deep learning phenomena are equally valuable | arXiv: 2506.23286
- ntpp generative speech language modeling for dual-channel spoken dialogue via ne | arXiv: 2506.00975
- of mice and machines a comparison of learning between real world mice and rl age | arXiv: 2505.12204
- olica efficient structured pruning of large language models without retraining | arXiv: 2506.08436
- omniarch building foundation model for scientific computing | arXiv: 2402.16014
- omniaudio generating spatial audio from 360-degree video | arXiv: 2504.14906
- omnibal towards fast instruction-tuning for vision-language models via omniverse | arXiv: 2407.20761
- on differential privacy for adaptively solving search problems via sketching | arXiv: 2506.05503
- on expressive power of looped transformers theoretical analysis and enhancement | arXiv: 2410.01405
- on fine-grained distinct element estimation | arXiv: 2506.22608
- on measuring long-range interactions in graph neural networks | arXiv: 2506.05971
- on temperature scaling and conformal prediction of deep classifiers | arXiv: 2402.05806
- on the clean generalization and robust overfitting in adversarial training from | arXiv: 2306.01271
- on the dynamic regret of following the regularized leader optimism with history | arXiv: 2505.22899
- on the effect of uncertainty on layer-wise inference dynamics | arXiv: 2507.06722
- on the importance of gaussianizing representations | arXiv: 2505.00685
- on the power of context-enhanced learning in llms | arXiv: 2503.01821
- on the robustness of reward models for language model alignment | arXiv: 2505.07271
- on the role of label noise in the feature learning process | arXiv: 2505.18909
- on the vulnerability of applying retrieval-augmented generation within knowledge | arXiv: 2409.17275
- on understanding attention-based in-context learning for categorical data | arXiv: 2405.17248
- one image is worth a thousand words a usability preservable text-image collabora | arXiv: 2505.11131
- one wave to explain them all a unifying perspective on feature attribution | arXiv: 2410.01482
- online pre-training for offline-to-online reinforcement learning | arXiv: 2507.08387
- online sparsification of bipartite-like clusters in graphs | arXiv: 2508.05437
- ood-chameleon is algorithm selection for ood generalization learnable | arXiv: 2410.02735
- open source planning control system with language agents for autonomous scientif | arXiv: 2507.07257
- open your eyes vision enhances message passing neural networks in link predictio | arXiv: 2505.08266
- open-det an efficient learning framework for open-ended detection | arXiv: 2505.20639
- optimal and practical batched linear bandit algorithm | arXiv: 2507.08438
- optimal auction design in the joint advertising | arXiv: 2507.07418
- optimal sensor scheduling and selection for continuous-discrete kalman filtering | arXiv: 2507.11240
- optimization over sparse support-preserving sets two-step projection with global | arXiv: 2506.08558
- optimizing language models for inference time objectives using reinforcement lea | arXiv: 2503.19595
- or-bench an over-refusal benchmark for large language models | arXiv: 2405.20947
- origin identification for text-guided image-to-image diffusion models | arXiv: 2501.02376
- orthorank token selection via sink token orthogonality for efficient llm inferen | arXiv: 2507.03865
- out-of-distribution detection methods answer the wrong questions | arXiv: 2507.01831
- outlier gradient analysis efficiently identifying detrimental training samples f | arXiv: 2405.03869
- overcoming multi-step complexity in multimodal theory-of-mind reasoning a scalab | arXiv: 2506.01301
- pac learning with improvements | arXiv: 2503.03184
- pak-ucb contextual bandit an online learning approach to prompt-aware selection | arXiv: 2410.13287
- parallelcomp parallel long-context compressor for length extrapolation | arXiv: 2502.14317
- parameter-efficient fine-tuning of state space models | arXiv: 2410.09016
- parity requires unified input dependence and negative eigenvalues in ssms | arXiv: 2508.07395
- parm multi-objective test-time alignment via preference-aware autoregressive rew | arXiv: 2505.06274
- parrot multilingual visual instruction tuning | arXiv: 2406.02539
- pde-transformer efficient and versatile transformers for physics simulations | arXiv: 2505.24717
- pencil long thoughts with short memory | arXiv: 2503.14337
- peptune de novo generation of therapeutic peptides with multi-objective-guided d | arXiv: 2412.17780
- performance plateaus in inference-time scaling for text-to-image diffusion witho | arXiv: 2506.12633
- permutation equivariant neural networks for symmetric tensors | arXiv: 2503.11276
- persistent topological features in large language models | arXiv: 2410.11042
- pessimism principle can be effective towards a framework for zero-shot transfer | arXiv: 2505.18447
- phantomwiki on-demand datasets for reasoning and retrieval evaluation | arXiv: 2502.20377
- physicsnerf physics-guided 3d reconstruction from sparse views | arXiv: 2505.23481
- pigdreamer privileged information guided world models for safe partially observa | arXiv: 2508.02159
- piloting structure-based drug design via modality-specific optimal schedule | arXiv: 2505.07286
- poisonbench assessing large language model vulnerability to data poisoning | arXiv: 2410.08811
- polyconf unlocking polymer conformation generation through hierarchical generati | arXiv: 2504.08859
- popri private federated learning using preference-optimized synthetic data | arXiv: 2504.16438
- poqd performance-oriented query decomposer for multi-vector retrieval | arXiv: 2505.19189
- position ai evaluation should learn from how we test humans | arXiv: 2306.10512
- position all current generative fidelity and diversity metrics are flawed | arXiv: 2505.22450
- position causal machine learning requires rigorous synthetic experiments for bro | arXiv: 2508.08883
- position dont use the clt in llm evals with fewer than a few hundred datapoints | arXiv: 2503.01747
- position lifetime tuning is incompatible with continual reinforcement learning | arXiv: 2404.02113
- position solve layerwise linear models first to understand neural dynamical phen | arXiv: 2502.21009
- position the future of bayesian prediction is prior-fitted | arXiv: 2505.23947
- position theory of mind benchmarks are broken for large language models | arXiv: 2412.19726
- position uncertainty quantification needs reassessment for large-language model | arXiv: 2505.22655
- position we need an algorithmic understanding of generative ai | arXiv: 2507.07544
- positional attention expressivity and learnability of algorithmic computation | arXiv: 2410.01686
- positional encoding meets persistent homology on graphs | arXiv: 2506.05814
- ppo-mi efficient black-box model inversion via proximal policy optimization | arXiv: 2502.14370
- practical principles for ai cost and compute accounting | arXiv: 2502.15873
- prediction via shapley value regression | arXiv: 2505.04775
- prediction-powered adaptive shrinkage estimation | arXiv: 2502.14166
- predictive data selection the data that predicts is the data that teaches | arXiv: 2503.00808
- preference adaptive and sequential text-to-image generation | arXiv: 2412.10419
- preference optimization for combinatorial optimization problems | arXiv: 2505.08735
- principal-agent bandit games with self-interested and exploratory learning agent | arXiv: 2412.16318
- principled algorithms for optimizing generalized metrics in binary classificatio | arXiv: 2512.23133
- privacy amplification through synthetic data insights from linear regression | arXiv: 2506.05101
- privacy-shielded image compression defending against exploitation from vision-la | arXiv: 2506.15201
- private model personalization revisited | arXiv: 2506.19220
- probabilistic interactive 3d segmentation with hierarchical neural processes | arXiv: 2505.01726
- probably approximately global robustness certification | arXiv: 2511.06495
- product of experts with llms boosting performance on arc is a matter of perspect | arXiv: 2505.07859
- progressive tempering sampler with diffusion | arXiv: 2506.05231
- promoting ensemble diversity with interactive bayesian distributional robustness | arXiv: 2506.07247
- proofcompass enhancing specialized provers with llm guidance | arXiv: 2507.14335
- protein structure tokenization benchmarking and new recipe | arXiv: 2503.00089
- protriever end-to-end differentiable protein homology search for fitness predict | arXiv: 2506.08954
- provable benefit of random permutations over uniform sampling in stochastic coor | arXiv: 2505.23152
- provable in-context vector arithmetic via retrieving task concepts | arXiv: 2508.09820
- provable maximum entropy manifold exploration via diffusion models | arXiv: 2506.15385
- provably cost-sensitive adversarial defense via randomized smoothing | arXiv: 2310.08732
- provably efficient algorithm for best scoring rule identification in online prin | arXiv: 2505.17379
- provably improving generalization of few-shot models with synthetic data | arXiv: 2505.24190
- proxy-fda proxy-based feature distribution alignment for fine-tuning vision foun | arXiv: 2505.24088
- putnam-axiom a functional and static benchmark for measuring higher level mathem | arXiv: 2508.08292
- q-resafe assessing safety risks and quantization-aware safety patching for quant | arXiv: 2506.20251
- qmamba on first exploration of vision mamba for image quality assessment | arXiv: 2406.09546
- quadratic upper bound for boosting robustness | arXiv: 2601.13645
- quantum algorithms for finite-horizon markov decision processes | arXiv: 2508.05712
- quantum optimization via gradient-based hamiltonian descent | arXiv: 2505.14670
- quest enhancing estimates of quantile-based distributional measures using model | arXiv: 2507.05220
- qure query-relevant retrieval through hard negative sampling in composed image r | arXiv: 2507.12416
- r3dm enabling role discovery and diversity through dynamics models in multi-agen | arXiv: 2505.24265
- radio rate-distortion optimization for large language model compression | arXiv: 2505.03031
- raising the bar investigating the values of large language models via generative | arXiv: 2406.14230
- random feature representation boosting | arXiv: 2501.18283
- random initialization of gated sparse adapters | arXiv: 2511.01794
- random registers for cross-domain few-shot learning | arXiv: 2506.02843
- randomized dimensionality reduction for euclidean maximization and diversity mea | arXiv: 2506.00165
- ranked entropy minimization for continual test-time adaptation | arXiv: 2505.16441
- ranked from within ranking large multimodal models without labels | arXiv: 2412.06461
- rapid long-context inference with retrieval-augmented speculative decoding | arXiv: 2502.20330
- raptor scalable train-free embeddings for 3d medical volumes leveraging pretrain | arXiv: 2507.08254
- rate causal explainability of reward models with imperfect counterfactuals | arXiv: 2410.11348
- re-imagine symbolic benchmark synthesis for reasoning evaluation | arXiv: 2506.15455
- re-ranking reasoning context with tree search makes large vision-language models | arXiv: 2506.07785
- reactivation empirical ntk dynamics under task shifts | arXiv: 2507.16039
- reasoning limitations of multimodal large language models a case study of bongar | arXiv: 2411.01173
- reasoning through execution unifying process and outcome rewards for code genera | arXiv: 2412.15118
- recommendations and reporting checklist for rigorous transparent human baselines | arXiv: 2506.13776
- recommendations with sparse comparison data provably fast convergence for noncon | arXiv: 2502.20033
- refersplat referring segmentation in 3d gaussian splatting | arXiv: 2508.08252
- reframe layer caching for accelerated inference in real-time rendering | arXiv: 2506.13814
- regress dont guess -- a regression-like loss on number tokens for language model | arXiv: 2411.02083
- regression for the mean auto-evaluation and inference with few labels through po | arXiv: 2411.12665
- reimagining parameter space exploration with diffusion models | arXiv: 2506.17807
- rejecting hallucinated state targets during planning | arXiv: 2410.07096
- relative error fair clustering in the weak-strong oracle model | arXiv: 2506.12287
- reliable algorithm selection for machine learning-guided design | arXiv: 2503.20767
- representation shattering in transformers a synthetic study with knowledge editi | arXiv: 2410.17194
- representative language generation | arXiv: 2505.21819
- resampling augmentation for time series contrastive learning application to remo | arXiv: 2506.18587
- researchtown simulator of human research community | arXiv: 2412.17767
- residual matrix transformers scaling the size of the residual stream | arXiv: 2506.22696
- restoregrad signal restoration using conditional denoising diffusion models with | arXiv: 2502.13574
- rethink the role of deep learning towards large-scale quantum systems | arXiv: 2505.13852
- rethinking addressing in language models via contexualized equivariant positiona | arXiv: 2501.00712
- rethinking aleatoric and epistemic uncertainty | arXiv: 2412.20892
- rethinking explainable machine learning as applied statistics | arXiv: 2402.02870
- rethinking external slow-thinking from snowball errors to probability of correct | arXiv: 2501.15602
- rethinking the bias of foundation model under long-tailed distribution | arXiv: 2501.15955
- rethinking the stability-plasticity trade-off in continual learning from an arch | arXiv: 2506.03951
- retraining with predicted hard labels provably increases model accuracy | arXiv: 2406.11206
- retraining-free merging of sparse moe via hierarchical clustering | arXiv: 2410.08589
- revealing weaknesses in text watermarking through self-information rewrite attac | arXiv: 2505.05190
- review remask refine r3 process-guided block diffusion for text generation | arXiv: 2507.08018
- revise learning to refine at test-time via intrinsic self-verification | arXiv: 2502.14565
- revisiting continuity of image tokens for cross-domain few-shot learning | arXiv: 2506.03110
- revisiting diffusion models from generative pre-training to one-step generation | arXiv: 2506.09376
- revisiting instance-optimal cluster recovery in the labeled stochastic block mod | arXiv: 2306.12968
- revisiting the predictability of performative social events | arXiv: 2503.11713
- revisiting unbiased implicit variational inference | arXiv: 2506.03839
- revolve optimizing ai systems by tracking response evolution in textual optimiza | arXiv: 2412.03092
- reward-augmented data enhances direct preference alignment of llms | arXiv: 2410.08067
- reward-free world models for online imitation learning | arXiv: 2410.14081
- riflex a free lunch for length extrapolation in video diffusion transformers | arXiv: 2502.15894
- right now wrong then non-stationary direct preference optimization under prefere | arXiv: 2407.18676
- risk and cross validation in ridge regression with correlated samples | arXiv: 2408.04607
- rlthf targeted human feedback for llm alignment | arXiv: 2502.13417
- robot-gated interactive imitation learning with adaptive intervention mechanism | arXiv: 2506.09176
- robust learning of diverse code edits | arXiv: 2503.03656
- robust multi-bit text watermark with llm-based paraphrasers | arXiv: 2412.03123
- robust multimodal large language models against modality conflict | arXiv: 2507.07151
- robust noise attenuation via adaptive pooling of transformer outputs | arXiv: 2506.09215
- robust offline reinforcement learning with linearly structured f-divergence regu | arXiv: 2411.18612
- rocketkv accelerating long-context llm inference via two-stage kv cache compress | arXiv: 2502.14051
- roll the dice look before you leap going beyond the creative limits of next-toke | arXiv: 2504.15266
- rollingq reviving the cooperation dynamics in multimodal transformer | arXiv: 2506.11465
- rulebreakers challenging llms at the crossroads between formal logic and human-l | arXiv: 2410.16502
- runtime analysis of evolutionary nas for multiclass classification | arXiv: 2506.06019
- sada stability-guided adaptive diffusion acceleration | arXiv: 2507.17135
- saebench a comprehensive benchmark for sparse autoencoders in language model int | arXiv: 2503.09532
- safe delta consistently preserving safety when fine-tuning llms on diverse datas | arXiv: 2505.12038
- safe finding sparse and flat minima to improve pruning | arXiv: 2506.06866
- safemap robust hd map construction from incomplete observations | arXiv: 2507.00861
- safer a calibrated risk-aware multimodal recommendation model for dynamic treatm | arXiv: 2506.06649
- safety alignment can be not superficial with explicit safety signals | arXiv: 2505.17072
- safety certificate against latent variables with partially unidentifiable dynami | arXiv: 2506.17927
- safetyanalyst interpretable transparent and steerable safety moderation for ai b | arXiv: 2410.16665
- sample complexity of distributionally robust off-dynamics reinforcement learning | arXiv: 2511.05396
- sample efficient demonstration selection for in-context learning | arXiv: 2506.08607
- sampling from binary quadratic distributions via stochastic localization | arXiv: 2505.19438
- sassha sharpness-aware adaptive second-order optimization with stable hessian ap | arXiv: 2502.18153
- scalable equilibrium sampling with sequential boltzmann generators | arXiv: 2502.18462
- scalable generation of spatial transcriptomics from histology images via whole-s | arXiv: 2506.05361
- scalable non-equivariant 3d molecule generation via rotational alignment | arXiv: 2506.10186
- scaling inference-efficient language models | arXiv: 2501.18107
- scaling large motion models with million-level human motions | arXiv: 2410.03311
- scaling value iteration networks to 5000 layers for extreme long-term planning | arXiv: 2406.08404
- scaling video-language models to 10k frames via hierarchical differential distil | arXiv: 2504.02438
- score matching with missing data | arXiv: 2506.00557
- scssl-bench benchmarking self-supervised learning for single-cell data | arXiv: 2506.10031
- sdp-crown efficient bound propagation for neural network verification with tight | arXiv: 2506.06665
- se3-equivariant diffusion policy in spherical fourier space | arXiv: 2507.01723
- secemb sparsity-aware secure federated learning of on-device recommender system | arXiv: 2505.12453
- self-consistency preference optimization | arXiv: 2411.04109
- self-disentanglement and re-composition for cross-domain few-shot segmentation | arXiv: 2506.02677
- self-organizing visual prototypes for non-parametric representation learning | arXiv: 2505.21533
- semantic shift estimation via dual-projection and classifier reconstruction for | arXiv: 2503.05423
- sensei semantic exploration guided by foundation models to learn versatile world | arXiv: 2503.01584
- separating knowledge and perception with procedural data | arXiv: 2508.11697
- sepllm accelerate large language models by compressing one segment into one sepa | arXiv: 2412.12094
- set valued predictions for robust domain generalization | arXiv: 2507.03146
- sgd jittering a training strategy for robust and accurate model-based architectu | arXiv: 2410.14667
- shielded diffusion generating novel and diverse images using sparse repellency | arXiv: 2410.06025
- simple and critical iterative denoising a recasting of discrete diffusion in gra | arXiv: 2503.21592
- simplemix frustratingly simple mixing of off- and on-policy data in language mod | arXiv: 2505.02363
- sk-vqa synthetic knowledge generation at scale for training context-augmented mu | arXiv: 2406.19593
- sketch to adapt fine-tunable sketches for efficient llm adaptation | arXiv: 2410.06364
- sketch-plan-generalize learning and planning with neuro-symbolic programmatic re | arXiv: 2404.07774
- sliding puzzles gym a scalable benchmark for state representation in visual rein | arXiv: 2410.14038
- slim one-shot quantization and sparsity with low-rank approximation for llm weig | arXiv: 2410.09615
- slimllm accurate structured pruning for large language models | arXiv: 2505.22689
- smoothed preference optimization via renoise inversion for aligning diffusion mo | arXiv: 2506.02698
- soft reasoning navigating solution spaces in large language models through contr | arXiv: 2505.24688
- softmax is not enough for sharp size generalisation | arXiv: 2410.01104
- solving probabilistic verification problems of neural networks using branch and | arXiv: 2405.17556
- solving zero-sum convex markov games | arXiv: 2506.16120
- sorbet a neuromorphic hardware-compatible transformer-based spiking language mod | arXiv: 2409.15298
- sortformer a novel approach for permutation-resolved speaker supervision in spee | arXiv: 2409.06656
- sounding that object interactive object-aware image to audio generation | arXiv: 2506.04214
- space your genomic profile predictor is a powerful dna foundation model | arXiv: 2506.01833
- sparse causal discovery with generative intervention for unsupervised graph doma | arXiv: 2507.07621
- sparse spectral training and inference on euclidean and hyperbolic neural networ | arXiv: 2405.15481
- sparse training from random initialization aligning lottery ticket masks using w | arXiv: 2505.05143
- sparse-pivot dynamic correlation clustering for node insertions | arXiv: 2507.01830
- sparselora accelerating llm fine-tuning with contextual sparsity | arXiv: 2506.16500
- sparsevlm visual token sparsification for efficient vision-language model infere | arXiv: 2410.04417
- speculative decoding in decentralized llm inference turning communication latenc | arXiv: 2511.11733
- sphinx structural prediction using hypergraph inference network | arXiv: 2410.03208
- spikevideoformer an efficient spike-driven video transformer with hamming attent | arXiv: 2505.10352
- star attention efficient llm inference over long sequences | arXiv: 2411.17116
- star learning diverse robot skill abstractions through rotation-augmented vector | arXiv: 2506.03863
- statistical and computational guarantees of kernel max-sliced wasserstein distan | arXiv: 2405.15441
- stealing that free lunch exposing the limits of dyna-style reinforcement learnin | arXiv: 2412.14312
- stealix model stealing via prompt evolution | arXiv: 2506.05867
- steer llm latents for hallucination detection | arXiv: 2503.01917
- steering protein language models | arXiv: 2509.07983
- stochastic encodings for active feature acquisition | arXiv: 2508.01957
- stofm a multi-scale foundation model for spatial transcriptomics | arXiv: 2507.11588
- strategic fusion optimizes transformer compression | arXiv: 2501.03273
- streamline without sacrifice -- squeeze out computation redundancy in lmm | arXiv: 2505.15816
- subspace optimization for large language models with convergence guarantees | arXiv: 2410.11289
- suica learning super-high dimensional sparse implicit neural representations for | arXiv: 2412.01124
- suitability filter a statistical framework for classifier evaluation in real-wor | arXiv: 2505.22356
- sum-of-parts self-attributing neural networks with end-to-end learning of featur | arXiv: 2310.16316
- supercharging graph transformers with advective diffusion | arXiv: 2310.06417
- supernova event dataset interpreting large language models personality through c | arXiv: 2506.12189
- symmetry-aware gflownets | arXiv: 2506.02685
- symmetry-robust 3d orientation estimation | arXiv: 2410.02101
- syndacate a synthetic dataset for evaluating part-whole hierarchical inference | arXiv: 2506.17558
- synonymous variational inference for perceptual image compression | arXiv: 2505.22438
- synthesizing images on perceptual boundaries of anns for uncovering and manipula | arXiv: 2505.03641
- synthetic face datasets generation via latent space exploration from brownian id | arXiv: 2405.00228
- synthetic perception can generated images unlock latent visual prior for text-ce | arXiv: 2506.17623
- system-aware unlearning algorithms use lesser forget faster | arXiv: 2506.06073
- t1 advancing language model reasoning through reinforcement learning and inferen | arXiv: 2501.11651
- tabflex scaling tabular learning to millions with linear attention | arXiv: 2506.05584
- tackling view-dependent semantics in 3d language gaussian splatting | arXiv: 2505.24746
- tamas benchmarking adversarial risks in multi-agent llm systems | arXiv: 2511.05269
- taming diffusion for dataset distillation with high representativeness | arXiv: 2505.18399
- taming knowledge conflicts in language models | arXiv: 2503.10996
- taming rectified flow for inversion and editing | arXiv: 2411.04746
- tango clustering with typicality-aware nonlocal mode-seeking and graph-cut optim | arXiv: 2408.10084
- targeted unlearning with single layer unlearning gradient | arXiv: 2407.11867
- task-agnostic pre-training and task-guided fine-tuning for versatile diffusion p | arXiv: 2409.19949
- tcp-diffusion a multi-modal diffusion model for global tropical cyclone precipit | arXiv: 2410.13175
- teaching llms to speak spectroscopy | arXiv: 2508.10075
- teaching physical awareness to llms through sounds | arXiv: 2506.08524
- temporal query network for efficient multivariate time series forecasting | arXiv: 2505.12917
- test-time adaptation with binary feedback | arXiv: 2505.18514
- test-time canonicalization by foundation models for robust perception | arXiv: 2507.10375
- test-time training provably improves transformers as in-context learners | arXiv: 2503.11842
- text-to-lora instant transformer adaption | arXiv: 2506.06105
- tgdpo harnessing token-level reward guidance for enhancing direct preference opt | arXiv: 2506.14574
- the best of both worlds bridging quality and diversity in data selection with bi | arXiv: 2410.12458
- the brains bitter lesson scaling speech decoding with self-supervised learning | arXiv: 2406.04328
- the butterfly effect neural network training trajectories are highly sensitive t | arXiv: 2506.13234
- the canarys echo auditing privacy risks of llm-generated synthetic text | arXiv: 2502.14921
- the challenge of teaching reasoning to llms without rl or distillation | arXiv: 2507.09850
- the courage to stop overcoming sunk cost fallacy in deep reinforcement learning | arXiv: 2506.13672
- the dark side of the forces assessing non-conservative force models for atomisti | arXiv: 2412.11569
- the devil is in the details tackling unimodal spurious correlations for generali | arXiv: 2503.03122
- the diffusion duality | arXiv: 2506.10892
- the disparate benefits of deep ensembles | arXiv: 2410.13831
- the double-ellipsoid geometry of clip | arXiv: 2411.14517
- the four color theorem for cell instance segmentation | arXiv: 2506.09724
- the impact of on-policy parallelized data collection on deep reinforcement learn | arXiv: 2506.03404
- the lock-in hypothesis stagnation by algorithm | arXiv: 2506.06166
- the panaceas for improving low-rank decomposition in communication-efficient fed | arXiv: 2505.23176
- the price of freedom exploring expressivity and runtime tradeoffs in equivariant | arXiv: 2506.13523
- the right to ai | arXiv: 2501.17899
- the ripple effect on unforeseen complications of backdoor attacks | arXiv: 2505.11586
- the sample complexity of online strategic decision making with information asymm | arXiv: 2506.09940
- the sharpness disparity principle in transformers for accelerating language mode | arXiv: 2502.19002
- theoretical guarantees on the best-of-n alignment policy | arXiv: 2401.01879
- theoretical limitations of ensembles in the age of overparameterization | arXiv: 2410.16201
- theoretical performance guarantees for partial domain adaptation via partial opt | arXiv: 2506.02712
- theoretically unmasking inference attacks against ldp-protected clients in feder | arXiv: 2506.17292
- thickness-aware e3-equivariant 3d mesh neural networks | arXiv: 2505.21572
- tilted sharpness-aware minimization | arXiv: 2410.22656
- time-aware world model for adaptive prediction and control | arXiv: 2506.08441
- timedart a diffusion autoregressive transformer for self-supervised time series | arXiv: 2410.05711
- timepoint accelerated time series alignment via self-supervised keypoint and des | arXiv: 2505.23475
- timepro efficient multivariate long-term time series forecasting with variable- | arXiv: 2505.20774
- timing temporality-aware integrated gradients for time series explanation | arXiv: 2506.05035
- tined gnns-to-mlps by teacher injection and dirichlet energy distillation | arXiv: 2412.11180
- to each metric its decoding post-hoc optimal decision rules of probabilistic hie | arXiv: 2506.01552
- to steer or not to steer mechanistic error reduction with abstention for languag | arXiv: 2510.13290
- tokenized bandit for llm decoding and alignment | arXiv: 2506.07276
- toma token merge with attention for diffusion models | arXiv: 2509.10918
- toping topologically interpretable graph learning via persistent rationale filtr | arXiv: 2510.05102
- toward data-centric directed graph learning an entropy-driven approach | arXiv: 2505.00983
- toward robust hyper-detailed image captioning a multiagent approach and dual eva | arXiv: 2412.15484
- toward safe and human-aligned game conversational recommendation via multi-agent | arXiv: 2504.20094
- towards a mechanistic explanation of diffusion model generalization | arXiv: 2411.19339
- towards an optimal control perspective of resnet training | arXiv: 2506.21453
- towards attributions of input variables in a coalition | arXiv: 2309.13411
- towards benchmarking foundation models for tabular data with text | arXiv: 2507.07829
- towards efficient online tuning of vlm agents via counterfactual soft reinforcem | arXiv: 2505.03792
- towards flexible perception with visual memory | arXiv: 2408.08172
- towards graph foundation models learning generalities across graphs via task-tre | arXiv: 2412.16441
- towards llm agents for earth observation | arXiv: 2504.12110
- towards long-horizon interpretability efficient and faithful multi-token attribu | arXiv: 2602.01914
- towards practical defect-focused automated code review | arXiv: 2505.17928
- towards rationale-answer alignment of lvlms via self-rationale calibration | arXiv: 2509.13919
- towards robust influence functions with flat validation minima | arXiv: 2505.19097
- towards trustworthy federated learning with untrusted participants | arXiv: 2505.01874
- towards universal offline black-box optimization via learning language model emb | arXiv: 2506.07109
- training a generally curious agent | arXiv: 2502.17543
- training dynamics of in-context learning in linear attention | arXiv: 2501.16265
- training flexible models of genetic variant effects from functional annotations | arXiv: 2506.19598
- training software engineering agents and verifiers with swe-gym | arXiv: 2412.21139
- transformative or conservative conservation laws for resnets and transformers | arXiv: 2506.06194
- transformer-based spatial-temporal counterfactual outcomes estimation | arXiv: 2506.21154
- transpl vq-code transition matrices for pseudo-labeling of time series unsupervi | arXiv: 2505.09955
- tree-sliced wasserstein distance a geometric perspective | arXiv: 2406.13725
- tree-sliced wasserstein distance with nonlinear projection | arXiv: 2505.00968
- treelora efficient continual learning via layer-wise loras guided by a hierarchi | arXiv: 2506.10355
- truly self-improving agents require intrinsic metacognitive learning | arXiv: 2506.05109
- tuco measuring the contribution of fine-tuning to individual responses of llms | arXiv: 2506.23423
- ui-evol automatic knowledge evolving for computer use agents | arXiv: 2505.21964
- ui-vision a desktop-centric gui benchmark for visual perception and interaction | arXiv: 2503.15661
- unable to forget proactive interference reveals working memory limits in llms be | arXiv: 2506.08184
- understanding and mitigating memorization in diffusion models for tabular data | arXiv: 2412.11044
- understanding and mitigating memorization in generative models via sharpness of | arXiv: 2412.04140
- understanding and mitigating miscalibration in prompt tuning for vision-language | arXiv: 2410.02681
- understanding mode connectivity via parameter space symmetry | arXiv: 2505.23681
- understanding model ensemble in transferable adversarial attack | arXiv: 2410.06851
- understanding sharpness dynamics in nn training with a minimalist example the ef | arXiv: 2506.06940
- understanding synthetic context extension via retrieval heads | arXiv: 2410.22316
- understanding the emergence of multimodal representation alignment | arXiv: 2502.16282
- understanding the limits of deep tabular methods with temporal shift | arXiv: 2502.20260
- understanding the statistical accuracy-communication trade-off in personalized f | arXiv: 2410.08934
- unhippo uncertainty-aware initialization for state space models | arXiv: 2506.05065
- unifews you need fewer operations for efficient graph neural networks | arXiv: 2403.13268
- unifying specialized visual encoders for video language models | arXiv: 2501.01426
- unimomo unified generative modeling of 3d molecules for de novo binder design | arXiv: 2503.19300
- unisim a unified simulator for time-coarsened dynamics of biomolecules | arXiv: 2506.03157
- universal neural optimal transport | arXiv: 2212.00133
- universal retrieval for multimodal trajectory modeling | arXiv: 2506.22056
- unlocking post-hoc dataset inference with synthetic data | arXiv: 2506.15271
- unlocking the capabilities of large vision-language models for generalizable and | arXiv: 2503.14853
- unlocking the power of rehearsal in continual learning a theoretical perspective | arXiv: 2506.00205
- unmore unsupervised multi-object segmentation via center-boundary reasoning | arXiv: 2506.01778
- unsupervised learning for class distribution mismatch | arXiv: 2505.06948
- update your transformer to the latest release re-basin of task vectors | arXiv: 2505.22697
- using multiple input modalities can improve data-efficiency and ood generalizati | arXiv: 2507.13385
- validating mechanistic interpretations an axiomatic approach | arXiv: 2407.13594
- video prediction policy a generalist robot policy with predictive visual represe | arXiv: 2412.14803
- vineppo refining credit assignment in rl training of llms | arXiv: 2410.01679
- vision graph prompting via semantic low-rank decomposition | arXiv: 2505.04121
- vision-language model selection and reuse for downstream adaptation | arXiv: 2501.18271
- vision-language models create cross-modal task representations | arXiv: 2410.22330
- visionts visual masked autoencoders are free-lunch zero-shot time series forecas | arXiv: 2408.17253
- visual generation without guidance | arXiv: 2501.15420
- visual language models as zero-shot deepfake detectors | arXiv: 2507.22469
- vocabtrim vocabulary pruning for efficient speculative decoding in llms | arXiv: 2506.22694
- vtgaussian-slam rgbd slam for large scale scenes with splatting view-tied 3d gau | arXiv: 2506.02741
- vulnerability-aware alignment mitigating uneven forgetting in harmful fine-tunin | arXiv: 2506.03850
- wasserstein policy optimization | arXiv: 2505.00663
- watch out your album on the inadvertent privacy memorization in multi-modal larg | arXiv: 2503.01208
- wave weighted autoregressive varying gate for time series forecasting | arXiv: 2410.03159
- weak-to-strong jailbreaking on large language models | arXiv: 2401.17256
- weisfeiler and leman go gambling why expressive lottery tickets win | arXiv: 2506.03919
- wgformer an se3-transformer driven by wasserstein gradient flows for molecular g | arXiv: 2410.09795
- what has a foundation model found using inductive bias to probe for world models | arXiv: 2507.06952
- what limits virtual agent application omnibench a scalable multi-dimensional ben | arXiv: 2506.08933
- what makes an ensemble un interpretable | arXiv: 2506.08216
- when bad data leads to good models | arXiv: 2505.04741
- when can in-context learning generalize out of task distribution | arXiv: 2506.05574
- when data-free knowledge distillation meets non-transferable teacher escaping ou | arXiv: 2507.04119
- when diffusion models memorize inductive biases in probability flow of minimum-n | arXiv: 2506.19031
- when every millisecond counts real-time anomaly detection via the multimodal asy | arXiv: 2506.17457
- when model knowledge meets diffusion model diffusion-assisted data-free image sy | arXiv: 2506.15381
- when will it fail anomaly to prompt for forecasting future anomalies in time ser | arXiv: 2506.23596
- whitened clip as a likelihood surrogate of images and captions | arXiv: 2505.06934
- why is spatial reasoning hard for vlms an attention mechanism perspective on foc | arXiv: 2503.01773
- widening the network mitigates the impact of data heterogeneity on fedavg | arXiv: 2508.12576
- wikibigedit understanding the limits of lifelong knowledge editing in llms | arXiv: 2503.05683
- wildchat-50m a deep dive into the role of synthetic data in post-training | arXiv: 2501.18511
- wilting trees interpreting the distance between mpnn embeddings | arXiv: 2505.24642
- winner-takes-all for multivariate probabilistic time series forecasting | arXiv: 2506.05515
- x hacking the threat of misguided automl | arXiv: 2401.08513
- x-transfer attacks towards super transferable adversarial attacks on clip | arXiv: 2505.05528
- xchemagents agentic ai for explainable quantum chemistry | arXiv: 2505.20574
- zero-shot adaptation of parameter-efficient fine-tuning in diffusion models | arXiv: 2506.04244
- zero-shot generalization of vision-based rl without data augmentation | arXiv: 2410.07441
- connecting thompson sampling and ucb towards more efficient best-fixed action | arXiv: 2505.02383
- on differential privacy for adaptively solving search problems | arXiv: 2506.05503
- retraining with predicted hard labels provably increases model accurac | arXiv: 2406.11206
- theoretically unmasking inference attacks against ldp-protected client data in | arXiv: 2506.17292
- adaptivestep automatically dividing reasoning step through model confidence | arXiv: 2502.13943
- dynamic benchmarking of reasoning capabilities in code large language models und | arXiv: 2503.04149
- efficoder enhancing code generation in large language models through efficiency- | arXiv: 2410.10209
- epicoder encompassing diversity and complexity in code generation | arXiv: 2501.04694
- function-to-style guidance of llms for code translation | arXiv: 2507.11083
- mind the gap a practical attack on gguf quantization | arXiv: 2505.23786
- reasoning through execution unifying process and outcome rewards for code genera | arXiv: 2412.15118
- robust learning of diverse code edits | arXiv: 2503.03656
- sparselora accelerating llm fine-tuning with contextual sparsity | arXiv: 2506.16500
- towards practical defect-focused automated code review | arXiv: 2505.17928
- training software engineering agents and verifiers with swe-gym | arXiv: 2412.21139
- agent warpp workflow adherence via runtime parallel personalization | arXiv: 2507.19543
- investigating non-transitivity in llm-as-a-judge | arXiv: 2502.14074
- position uncertainty quantification needs reassessment for large-language model | arXiv: 2505.22655
- action-minimization meets generative modeling efficient transition pat | arXiv: 2504.18506
- impact iterative mask-based parallel decoding for text-to-audio generation | arXiv: 2506.00736
- when model knowledge meets diffusion model diffusion-assisted data-free image synthesis | arXiv: 2506.15381
- epsilon-vae denoising as visual decoding | arXiv: 2410.04081
- dont lag rag training-free adversarial detection using rag | arXiv: 2504.04858
- poqd performance-oriented query decomposer for multi-vector retrieval | arXiv: 2505.19189
- rapid long-context inference with retrieval-augmented speculative decoding | arXiv: 2502.20330
- rethinking addressing in language models via contexualized equivariant positiona | arXiv: 2501.00712
- understanding synthetic context extension via retrieval heads | arXiv: 2410.22316
- a cross modal knowledge distillation data augmentation recipe for improving tran | arXiv: 2505.21317
- a reasoning-based approach to cryptic crossword clue solving | arXiv: 2506.04824
- ab initio nonparametric variable selection for scalable symbolic regression with | arXiv: 2410.13681
- avoiding leakage poisoning concept interventions under distribution shifts | arXiv: 2504.17921
- concept-based unsupervised domain adaptation | arXiv: 2505.05195
- configurable preference tuning with rubric-guided synthetic data | arXiv: 2506.11702
- conformal prediction as bayesian quadrature | arXiv: 2502.13228
- do sparse autoencoders generalize a case study of answerability | arXiv: 2502.19964
- evaluating neuron explanations a unified framework with sanity checks | arXiv: 2506.05774
- evolving prompts in-context an open-ended self-replicating perspective | arXiv: 2506.17930
- explaining fast and slow abstraction and refinement of provable explanations | arXiv: 2506.08505
- foundation molecular grammar multi-modal foundation models induce interpretable | arXiv: 2505.22948
- inference-time decomposition of activations itda a scalable approach to interpre | arXiv: 2505.17769
- leveraging predictive equivalence in decision trees | arXiv: 2506.14143
- mib a mechanistic interpretability benchmark | arXiv: 2504.13151
- modeling user behavior from adaptive surveys with supplemental context | arXiv: 2507.20919
- near optimal decision trees in a split second | arXiv: 2502.15988
- on the effect of uncertainty on layer-wise inference dynamics | arXiv: 2507.06722
- on the power of context-enhanced learning in llms | arXiv: 2503.01821
- position we need an algorithmic understanding of generative ai | arXiv: 2507.07544
- rethinking explainable machine learning as applied statistics | arXiv: 2402.02870
- safetyanalyst interpretable transparent and steerable safety moderation for ai b | arXiv: 2410.16665
- slim one-shot quantization and sparsity with low-rank approximation for llm weig | arXiv: 2410.09615
- supernova event dataset interpreting large language models personality through c | arXiv: 2506.12189
- to steer or not to steer mechanistic error reduction with abstention for languag | arXiv: 2510.13290
- towards attributions of input variables in a coalition | arXiv: 2309.13411
- towards flexible perception with visual memory | arXiv: 2408.08172
- what makes an ensemble un interpretable | arXiv: 2506.08216
- why is spatial reasoning hard for vlms an attention mechanism perspective on foc | arXiv: 2503.01773
- representation shattering in transformers a synthetic study with knowledge editi | arXiv: 2410.17194
- wikibigedit understanding the limits of lifelong knowledge editing in llms | arXiv: 2503.05683
- theorem-of-thought a multi-agent framework for abductive deductive and inductive | arXiv: 2506.07106
- are llm belief updates consistent with bayes theorem | arXiv: 2507.17951
- communicating activations between language model agents | arXiv: 2501.14082
- cooperation of experts fusing heterogeneous information with large margin | arXiv: 2505.20853
- correlated errors in large language models | arXiv: 2506.07962
- cross-regularization adaptive model complexity through validation gradients | arXiv: 2506.19755
- dilqr differentiable iterative linear quadratic regulator via implicit different | arXiv: 2506.17473
- disentangling and integrating relational and sensory information in transformer | arXiv: 2405.16727
- enigma interactive tools substantially assist lm agents in finding security vuln | arXiv: 2409.16165
- evaluating llms across multi-cognitive levels from medical knowledge mastery to | arXiv: 2506.08349
- faster and stronger when ann-snn conversion meets parallel spiking calculation | arXiv: 2412.13610
- fedtail federated long-tailed domain generalization with sharpness-guided gradie | arXiv: 2506.08518
- feedforward few-shot species range estimation | arXiv: 2502.14977
- fully heteroscedastic count regression with deep double poisson networks | arXiv: 2406.09262
- function encoders a principled approach to transfer learning in hilbert spaces | arXiv: 2501.18373
- g-sim generative simulations with large language models and gradient-free calibr | arXiv: 2506.09272
- gradient aligned regression via pairwise losses | arXiv: 2402.06104
- how much can we forget about data contamination | arXiv: 2410.03249
- hyperband-based bayesian optimization for black-box prompt selection | arXiv: 2412.07820
- improved and oracle-efficient online ell 1-multicalibration | arXiv: 2505.17365
- improving generalization with flat hilbert bayesian inference | arXiv: 2410.04196
- improving the effective receptive field of message-passing neural networks | arXiv: 2505.23185
- latent imputation before prediction a new computational paradigm for de novo pep | arXiv: 2505.17524
- learning distribution-wise control in representation space for language models | arXiv: 2506.06686
- learning safe strategies for value maximizing buyers in uniform price auctions | arXiv: 2406.03674
- leveraging online olympiad-level math problems for llms training and contaminati | arXiv: 2501.14275
- llm-srbench a new benchmark for scientific equation discovery with large languag | arXiv: 2504.10415
- meek models shall inherit the earth | arXiv: 2507.07931
- on temperature scaling and conformal prediction of deep classifiers | arXiv: 2402.05806
- phantomwiki on-demand datasets for reasoning and retrieval evaluation | arXiv: 2502.20377
- position ai evaluation should learn from how we test humans | arXiv: 2306.10512
- promoting ensemble diversity with interactive bayesian distributional robustness | arXiv: 2506.07247
- provably cost-sensitive adversarial defense via randomized smoothing | arXiv: 2310.08732
- random registers for cross-domain few-shot learning | arXiv: 2506.02843
- researchtown simulator of human research community | arXiv: 2412.17767
- runtime analysis of evolutionary nas for multiclass classification | arXiv: 2506.06019
- set valued predictions for robust domain generalization | arXiv: 2507.03146
- the best of both worlds bridging quality and diversity in data selection with bi | arXiv: 2410.12458
- ui-evol automatic knowledge evolving for computer use agents | arXiv: 2505.21964
- unlocking post-hoc dataset inference with synthetic data | arXiv: 2506.15271
- expert evaluation of llm world models a high-t c superconductivity case study | arXiv: 2511.03782
- la rosa enhancing llm efficiency via layerwise rotated sparse activation | arXiv: 2507.01299
- regress dont guess -- a regression-like loss on number tokens for language model | arXiv: 2411.02083
- a square peg in a square hole meta-expert for long-tailed semi-supervised learni | arXiv: 2505.16341
- algebra unveils deep learning -- an invitation to neuroalgebraic geometry | arXiv: 2501.18915
- bayesian neural scaling law extrapolation with prior-data fitted networks | arXiv: 2505.23032
- benign overfitting in token selection of attention mechanism | arXiv: 2409.17625
- chameleon a flexible data-mixing framework for language model pretraining and fi | arXiv: 2505.24844
- counting in small transformers the delicate interplay between attention and feed | arXiv: 2407.11542
- density ratio estimation-based bayesian optimization with semi-supervised learni | arXiv: 2305.15612
- dipllm fine-tuning llm for strategic decision-making in diplomacy | arXiv: 2506.09655
- does data scaling lead to visual compositional generalization | arXiv: 2507.07102
- evaluating morphological alignment of tokenizers in 70 languages | arXiv: 2507.06378
- how to synthesize text data without model collapse | arXiv: 2412.14689
- in-context adaptation to concept drift for learned database operations | arXiv: 2505.04404
- inductive gradient adjustment for spectral bias in implicit neural representatio | arXiv: 2410.13271
- language model developers should report train-test overlap | arXiv: 2410.08385
- language models over canonical byte-pair encodings | arXiv: 2506.07956
- large language models are demonstration pre-selectors for themselves | arXiv: 2506.06033
- llm data selection and utilization via dynamic bi-level optimization | arXiv: 2507.16178
- metadata conditioning accelerates language model pre-training | arXiv: 2501.01956
- on the clean generalization and robust overfitting in adversarial training from | arXiv: 2306.01271
- on the role of label noise in the feature learning process | arXiv: 2505.18909
- position the future of bayesian prediction is prior-fitted | arXiv: 2505.23947
- revisiting continuity of image tokens for cross-domain few-shot learning | arXiv: 2506.03110
- the dark side of the forces assessing non-conservative force models for atomisti | arXiv: 2412.11569
- the double-ellipsoid geometry of clip | arXiv: 2411.14517
- the sharpness disparity principle in transformers for accelerating language mode | arXiv: 2502.19002
- tokenized bandit for llm decoding and alignment | arXiv: 2506.07276
- towards robust influence functions with flat validation minima | arXiv: 2505.19097
- when can in-context learning generalize out of task distribution | arXiv: 2506.05574
- whitened clip as a likelihood surrogate of images and captions | arXiv: 2505.06934
- one missing piece for open-source reasoning models a dataset to mitigate cold-st | arXiv: 2506.02338
- pcot persuasion-augmented chain of thought for detecting fake news and social me | arXiv: 2506.06842
- pcot persuasion disinfo | arXiv: 2506.06842
- putnam-axiom a functional and static benchmark for measuring higher level mathem | arXiv: 2508.08292
- quire better cot | arXiv: 2405.18915
- towards better chain-of-thought a reflection on effectiveness and faithfulness | arXiv: 2405.18915
- became bayesian continual learning with adaptive model merging | arXiv: 2504.02666
- cut out and replay a simple yet versatile strategy for multi-label online contin | arXiv: 2505.19680
- improving continual learning performance and efficiency with auxiliary classifie | arXiv: 2403.07404
- negmerge sign-consensual weight merging for machine unlearning | arXiv: 2410.05583
- system-aware unlearning algorithms use lesser forget faster | arXiv: 2506.06073
- unlocking the power of rehearsal in continual learning a theoretical perspective | arXiv: 2506.00205
- training flexible models of genetic variant effects from functional annotations | arXiv: 2506.19598
- fgfp a fractional gaussian filter and pruning for deep neural networks compression | arXiv: 2507.22527
- m3-jepa multimodal alignment via multi-gate moe based on the joint-embedding pre | arXiv: 2409.05929
- overcoming multi-step complexity in multimodal theory-of-mind reasoning | arXiv: 2506.01301
- ui-evol automatic knowledge evolving for computer use agents | arXiv: 2505.21964
- recommendations with sparse comparison data provably fast convergence for noncon | arXiv: 2502.20033
- towards practical defect-focused automated code review | arXiv: 2505.17928
- gradual transition from bellman optimality operator to bellman operator in | arXiv: 2506.05968
- closed-form solutions a new perspective on solving differential equations | arXiv: 2405.14620
- actionpiece contextually tokenizing action sequences generative recommendation | arXiv: 2502.13581
- large language model llm-enabled in-context learning for wireless network optimi | arXiv: 2408.00214
- defame dynamic evidence-based fact-checking with multimodal experts | arXiv: 2412.10510
- dynamical phases of short-term memory mechanisms in rnns | arXiv: 2502.17433
- is your llm-based multi-agent a reliable real-world planner exploring fraud dete | arXiv: 2505.16557
- learning survival distributions with the asymmetric laplace distribution | arXiv: 2505.03712
- or-bench an over-refusal benchmark for large language models | arXiv: 2405.20947
- raising the bar investigating the values of large language models via generative | arXiv: 2406.14230
- when bad data leads to good models | arXiv: 2505.04741
- when will it fail anomaly to prompt for forecasting future anomalies in time seri | arXiv: 2506.23596
- asymrnr video diffusion transformers acceleration with asymmetric reduction and | arXiv: 2412.11706
- ca2-vdm efficient autoregressive video diffusion model with causal generation an | arXiv: 2411.16375
- data-juicer sandbox a feedback-driven suite for multimodal data-model co-develop | arXiv: 2407.11784
- diffusion adversarial post-training for one-step video generation | arXiv: 2501.08316
- how far is video generation from world model a physical law perspective | arXiv: 2411.02385
- mimicmotion high-quality human motion video generation with confidence-aware pos | arXiv: 2406.19680
- riflex a free lunch for length extrapolation in video diffusion transformers | arXiv: 2502.15894