ACL2026 论文笔记 TODO¶

总计: 680 篇 | 已完成: 680 | 待更新: 0

34excuse me may i say something34 colabscience a proactive ai assistant for biom | arXiv: 2604.15588
a computational method for measuring 34open codes34 in qualitative analysis | arXiv: 2411.12142
a layer-wise analysis of supervised fine-tuning | arXiv: 2604.11838
a multilingual dataset and empirical validation for the mutual reinforcement eff | arXiv: 2407.10953
a structured clustering approach for inducing media narratives | arXiv: 2604.10368
a study of llms39 preferences for libraries and programming languages | arXiv: 2503.17181
a survey of reinforcement learning for large language models under data scarcity | arXiv: 2604.17312
a survey on mllm-based visually rich document understanding methods challenges a | arXiv: 2507.09861
a unified framework for modeling heterogeneous financial data via dual-granulari | arXiv: 2404.13004
abstain-r1 calibrated abstention and post-refusal clarification via verifiable r | arXiv: 2604.17073
accelerating training of autoregressive video generation models via local optimi | arXiv: 2604.07402
across programming language silos a study on cross-lingual retrieval-augmented c | arXiv: 2506.03535
adam39s law textual frequency law on large language models | arXiv: 2604.02176
Adaptive Instruction Composition for Automated LLM Red-Teaming | arXiv: 2604.21159
adaptive layer selection for layer-wise token pruning in llm inference | arXiv: 2601.07667
adaptive text anonymization learning privacy-utility trade-offs via prompt optim | arXiv: 2602.20743
addressing overthinking in large vision-language models via gated perception-rea | arXiv: 2601.04442
affectron emotional speech synthesis with affective and contextually aligned non | arXiv: 2603.14432
AFMRL: Attribute-Enhanced Fine-Grained Multi-Modal Representation Learning in E-commerce | arXiv: 2604.20135
agencybench benchmarking the frontiers of autonomous agents in 1m-token real-wor | arXiv: 2601.11044
agent-gwo collaborative agents for dynamic prompt optimization in large language | arXiv: 2604.18612
agentgl towards agentic graph learning with llms via reinforcement learning | arXiv: 2604.05846
agentic conversational search with contextualized reasoning via reinforcement le | arXiv: 2601.13115
agree disagree explain decomposing human label variation in nli through the lens | arXiv: 2510.16458
agsc adaptive granularity and semantic clustering for uncertainty quantification | arXiv: 2604.06812
aica-bench holistically examining the capabilities of vlms in affective image co | arXiv: 2604.05900
aim-cot active information-driven multimodal chain-of-thought for vision-languag | arXiv: 2509.25699
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation | arXiv: 2604.18240
alexandria a multi-domain dialectal arabic machine translation dataset for cultu | arXiv: 2601.13099
aligning agents via planning a benchmark for trajectory-level reward modeling | arXiv: 2604.08178
aligning language models with real-time knowledge editing | arXiv: 2508.01302
aligning what llms do and say towards self-consistent explanations | arXiv: 2506.07523
alignment data map for efficient preference data selection and diagnosis | arXiv: 2505.23114
all changes may have invariant principles improving ever-shifting harmful meme d | arXiv: 2601.04567
All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG | arXiv: 2604.20199
alphacontext an evolutionary tree-based psychometric context generator for creat | arXiv: 2604.18398
among us language of conspiracy theorists on mainstream reddit | arXiv: 2506.05086
an existence proof for neural language models that can explain garden-path effec | arXiv: 2604.18293
an exploration of mamba for speech self-supervised models | arXiv: 2506.12606
an iterative utility judgment framework inspired by philosophical relevance via | arXiv: 2406.11290
analytical ffn-to-moe restructuring via activation pattern analysis | arXiv: 2502.04416
anchored cyclic generation a novel paradigm for long-sequence symbolic music gen | arXiv: 2604.05343
anchormem anchored facts with associative contexts for building memory in large | arXiv: 2604.17377
anchorseg language grounded query banks for reasoning segmentation | arXiv: 2604.18562
anonpsy a graph-based framework for structure-preserving de-identification of ps | arXiv: 2601.13503
apex-mem agentic semi-structured memory with temporal reasoning for long-term co | arXiv: 2604.14362
are emotion and rhetoric neurons in llm neuron recognition and adaptive masking | arXiv: 2604.17255
are large language models economically viable for industry deployment | arXiv: 2604.19342
are they lovers or friends evaluating llms39 social reasoning in english and kor | arXiv: 2510.19028
ark answer-centric retriever tuning via kg-augmented curriculum learning | arXiv: 2511.16326
aroma augmented reasoning over a multimodal architecture for virtual cell geneti | arXiv: 2604.20263
arrowgev grounding events in video via learning the arrow of time | arXiv: 2601.06559
arxiv2table toward realistic benchmarking and evaluation for llm-based literatur | arXiv: 2504.10284
atlas adaptive trading with llm agents through dynamic prompt optimization and m | arXiv: 2510.15949
attnpo attention-guided process supervision for efficient reasoning | arXiv: 2602.09953
attribution citation and quotation a survey of evidence-based text generation wi | arXiv: 2508.15396
author-in-the-loop response generation and evaluation integrating author experti | arXiv: 2602.11173
automatic combination of sample selection strategies for few-shot learning | arXiv: 2402.03038
automatic slide updating with user-defined dynamic templates and natural languag | arXiv: 2604.17894
autopkg an automated framework for dynamic e-commerce product-attribute knowledg | arXiv: 2604.16950
autoreproduce automatic ai experiment reproduction with paper lineage | arXiv: 2505.20662
bayesian active learning with gaussian processes guided by llm relevance scoring | arXiv: 2604.17906
bayesian social deduction with graph-informed language models | arXiv: 2506.17788
bd-tp self-supervised speech models discover phonological vector arithmetic | arXiv: 2602.18899
benchmarking and enabling efficient chinese medical retrieval via asymmetric enc | arXiv: 2604.10937
benchmarking deflection and hallucination in large vision-language models | arXiv: 2604.12033
better and worse with scale how contextual entrainment diverges with model size | arXiv: 2604.13275
beyond accuracy unveiling inefficiency patterns in tool-integrated reasoning | arXiv: 2604.05404
beyond black-box interventions latent probing for faithful retrieval-augmented g | arXiv: 2510.12460
beyond end-to-end dynamic chain optimization for private llm adaptation on the e | arXiv: 2604.06819
beyond explicit refusals soft-failure attacks on retrieval-augmented generation | arXiv: 2604.18663
beyond itinerary planning-a real-world benchmark for multi-turn and tool-using t | arXiv: 2512.22673
beyond literal mapping benchmarking and improving non-literal translation evalua | arXiv: 2601.07338
beyond marginal distributions a framework to evaluate the representativeness of | arXiv: 2601.15755
beyond prompt fine-grained simulation of cognitively impaired standardized patie | arXiv: 2604.12210
beyond reproduction a paired-task framework for assessing llm comprehension and | arXiv: 2604.18169
beyond the final actor modeling the dual roles of creator and editor for fine-gr | arXiv: 2604.04932
beyond the individual virtualizing multi-disciplinary reasoning for clinical int | arXiv: 2604.08927
beyond transcription unified audio schema for perception-aware audiollms | arXiv: 2604.12506
bhashasutra a task-centric unified survey of indian nlp datasets corpora and res | arXiv: 2604.18423
biasedtales-ml a multilingual dataset for analyzing narrative attribute distribu | arXiv: 2604.17008
biohicl hierarchical multi-label contrastive learning for biomedical retrieval w | arXiv: 2604.15591
bizcompass benchmarking the reasoning capabilities of llms in business knowledge | arXiv: 2604.17305
bookagent orchestrating safety-aware visual narratives via multi-agent cognitive | arXiv: 2604.16541
bootstrapping code translation with weighted multilanguage exploration | arXiv: 2601.03512
bosch black-box binary optimization for short-context attention-head selection i | arXiv: 2604.05942
boundrl efficient structured text segmentation through reinforced boundary gener | arXiv: 2510.20151
breaking block boundaries anchor-based history-stable decoding for diffusion lar | arXiv: 2604.08964
bridging sft and rl dynamic policy optimization for robust reasoning | arXiv: 2604.08926
budget-aware anytime reasoning with llm-synthesized preference data | arXiv: 2601.11038
calibrated not for everyone how sexual orientation and religious markers distort | arXiv: 2604.17316
calibrated speculative decoding frequency-guided candidate selection for efficie | arXiv: 2604.13634
can ai-generated persuasion be detected persuaficial benchmark and ai vs human l | arXiv: 2601.04925
can continual pre-training bridge the performance gap between general-purpose an | arXiv: 2604.19394
cap controllable alignment prompting for unlearning in llms | arXiv: 2604.21251
capabilities and evaluation biases of large language models in classical chinese | arXiv: 2510.15313
caro chain-of-analogy reasoning optimization for robust content moderation | arXiv: 2604.10504
cartbench evaluating vision-language models on chinese art understanding interpr | arXiv: 2604.11632
cast achieving stable llm-based text analysis for data analytics | arXiv: 2602.15861
causaldetox causal head selection and intervention for language model detoxifica | arXiv: 2604.14602
cbrs cognitive blood request system with bilingual dataset and dual-layer filter | arXiv: 2604.16665
ce-gppo coordinating entropy via gradient-preserving clipping policy optimizatio | arXiv: 2509.20712
chain-of-thought as a lens evaluating structured reasoning alignment between hum | arXiv: 2511.06168
chairo contextual hierarchical analogical induction and reasoning optimization f | arXiv: 2604.10502
challenging the boundaries of reasoning an olympiad-level math benchmark for lar | arXiv: 2503.21380
chathls towards systematic design automation and optimization for high-level syn | arXiv: 2507.00642
chemamp amplified chemistry tools via composable agents | arXiv: 2505.21569
chemvlr prioritizing reasoning in perception for chemical vision-language unders | arXiv: 2604.06685
chipseek optimizing verilog generation via eda-integrated reinforcement learning | arXiv: 2507.04736
chunqiutr time-keyed temporal retrieval in classical chinese annals | arXiv: 2604.06997
ci-work benchmarking contextual integrity in enterprise llm agents | arXiv: 2604.21308
cipo counterfactual unlearning for large reasoning models through iterative pref | arXiv: 2604.15847
citeguard faithful citation attribution for llms via retrieval-augmented validat | arXiv: 2510.17853
clag adaptive memory organization via agent-driven clustering for small language | arXiv: 2603.15421
clare-ty amid chaos quantifying representational entanglement to predict ripple | arXiv: 2603.19297
clewr curriculum learning with restarts for machine translation preference learn | arXiv: 2601.05858
climatecause complex and implicit causal structures in climate reports | arXiv: 2604.14856
closing the modality reasoning gap for speech large language models | arXiv: 2601.05543
codepromptzip code-specific prompt compression for retrieval-augmented generatio | arXiv: 2502.14925
coderl improving code generation via reinforcement with execution semantics alig | arXiv: 2510.18471
codestruct code agents over structured action spaces | arXiv: 2604.05407
codewiki evaluating ai39s ability to generate holistic documentation for large-s | arXiv: 2510.24428
codial interpretable task-oriented dialogue systems through dialogue flow alignm | arXiv: 2506.02264
coevolve training llm agents via agent-data mutual evolution | arXiv: 2604.15840
coggen a cognitively inspired recursive framework for deep research report gener | arXiv: 2604.17072
cognitive policy-driven llm for diagnosis and intervention of cognitive distorti | arXiv: 2604.17178
collabcoder plan-code co-evolution via collaborative decision-making for efficie | arXiv: 2604.13946
collaborative multi-agent scripts generation for enhancing imperfect-information | arXiv: 2604.11741
common to whom regional cultural commonsense and llm bias in india | arXiv: 2601.15550
commonsense knowledge with negation a resource to enhance negation understanding | arXiv: 2604.19921
compact example-based explanations for language models | arXiv: 2601.03786
comparing human and large language model interpretation of implicit information | arXiv: 2604.17085
compiling activation steering into weights via null-space constraints for stealt | arXiv: 2604.12359
compositional steering of large language models with steering tokens | arXiv: 2601.05062
computational narrative understanding for expressive text-to-speech | arXiv: 2509.04072
conjecture and inquiry quantifying software performance requirements via interac | arXiv: 2604.21380
conjunctive prompt attacks in multi-agent llm systems | arXiv: 2604.16543
conlangcrafter constructing languages with a multi-hop llm pipeline | arXiv: 2508.06094
consistrm improving generative reward models via consistency-aware self-training | arXiv: 2604.07484
content fuzzing for escaping information cocoons on digital social media | arXiv: 2604.05461
context attribution with multi-armed bandit optimization | arXiv: 2506.19977
context-value-action architecture for value-driven large language model agents | arXiv: 2604.05939
contrastive decoding mitigates score range bias in llm-as-a-judge | arXiv: 2510.18196
controlaudio tackling text-guided timing-indicated and intelligible audio genera | arXiv: 2510.08878
controlling multimodal conversational agents with coverage-enhanced latent actio | arXiv: 2601.07516
costomcausal-oriented steering for intrinsic theory-of-mind alignment in large l | arXiv: 2604.10031
counterrefine answer-conditioned counterevidence retrieval for inference-time kn | arXiv: 2603.16091
craft training-free cascaded retrieval for tabular qa | arXiv: 2505.14984
creating conlangs to probe the metalinguistic grammatical knowledge of llms | arXiv: 2510.07591
creditdecoding accelerating parallel decoding in diffusion large language models | arXiv: 2510.06133
crisp compressing redundancy in chain-of-thought via intrinsic saliency pruning | arXiv: 2604.17297
cross-modal taxonomic generalization in vision- language models | arXiv: 2603.07474
cura clinical uncertainty risk alignment for language model-based risk predictio | arXiv: 2604.14651
curate continual unlearning in real time with ensured preservation of llm knowle | arXiv: 2604.14644
curing miracle steps in llm mathematical reasoning with rubric rewards | arXiv: 2510.07774
dart mitigating harm drift in difference-aware llms via distill-audit-repair tra | arXiv: 2604.16845
dash-kv accelerating long-context llm inference via asymmetric kv cache hashing | arXiv: 2604.19351
data mixing agent learning to re-weight domains for continual pre-training | arXiv: 2507.15640
de-anonymization at scale via tournament-style attribution | arXiv: 2601.12407
debating the unspoken role-anchored multi-agent reasoning for half-truth detecti | arXiv: 2604.19005
decisive guiding user decisions with optimal preference elicitation from unstruc | arXiv: 2604.18122
decoupling the effect of chain-of-thought reasoning a human label variation pers | arXiv: 2601.03154
decovec building decoding space based task vector for large language models via | arXiv: 2604.11129
deepguard secure code generation via multi-layer semantic aggregation | arXiv: 2604.09089
deepprune parallel scaling without inter-trace redundancy | arXiv: 2510.08483
deliberative searcher improving llm reliability via reinforcement learning with | arXiv: 2507.16727
detecting hallucinations in speechllms at inference time using attention maps | arXiv: 2604.19565
detecting rag extraction attack via dual-path runtime integrity game | arXiv: 2604.10717
detoxification for llm from dataset itself | arXiv: 2604.19124
dia-harm dialectal disparities in harmful content detection across 50 english di | arXiv: 2604.05318
dialectic-med mitigating diagnostic hallucinations via counterfactual adversaria | arXiv: 2604.11258
diffusion-cam faithful visual explanations for dmllms | arXiv: 2604.11005
disambiguation-centric finetuning makes enterprise tool-calling llms more realis | arXiv: 2507.03336
discourse coherence and response-guided context rewriting for multi-party dialog | arXiv: 2604.06784
discover and prove an open-source agentic framework for hard mode automated theo | arXiv: 2604.15839
discovering a shared logical subspace steering llm logical reasoning via alignme | arXiv: 2604.19716
dissecting failure dynamics in large language model reasoning | arXiv: 2604.14528
distorted or fabricated a survey on hallucination in video llms | arXiv: 2604.12944
diversity collapse in multi-agent llm systems structural coupling and collective | arXiv: 2604.18005
diziner disagreement-guided instruction refinement via pilot annotation simulati | arXiv: 2604.15866
do llms know tool irrelevance demystifying structural alignment bias in tool inv | arXiv: 2604.11322
do llms overthink basic math reasoning benchmarking the accuracy-efficiency trad | arXiv: 2507.04023
do not step into the same river twice learning to reason from trial and error | arXiv: 2510.26109
do we need distinct representations for every speech token unveiling and exploit | arXiv: 2604.06871
doc-pp document policy preservation benchmark for large vision-language models | arXiv: 2601.03926
domain-specific data generation framework for rag adaptation | arXiv: 2510.11217
don39t act blindly robust gui automation via action-effect verification and self | arXiv: 2604.05477
don39t adapt small language models for tools adapt tool schemas to the models | arXiv: 2510.07248
dpc training-free text-to-sql candidate selection via dual-paradigm consistency | arXiv: 2604.15163
dqa diagnostic question answering for it support | arXiv: 2604.05350
dr assistant enhancing clinical diagnostic inquiry via structured diagnostic rea | arXiv: 2601.13690
duet dual execution for test output prediction with generated code and pseudocod | arXiv: 2604.11514
dynamic emotion and personality profiling for multimodal deception detection | arXiv: 2604.17037
dynamics of cognitive heterogeneity investigating behavioral biases in multi-sta | arXiv: 2604.17220
e2e-gmner end-to-end generative grounded multimodal named entity recognition | arXiv: 2604.17319
e2edev benchmarking large language models in end-to-end software development tas | arXiv: 2510.14509
ea-agent a structured multi-step reasoning agent for entity alignment | arXiv: 2604.11686
easy samples are all you need self-evolving llms via data-efficient reinforcemen | arXiv: 2604.18639
eet experience-driven early termination for cost-efficient software engineering | arXiv: 2601.05777
efficient and effective internal memory retrieval for llm-based healthcare predi | arXiv: 2604.07659
efficient inference for large vision-language models bottlenecks techniques and | arXiv: 2604.05546
efficient learned data compression via dual-stream feature decoupling | arXiv: 2604.07239
efficient prm training data synthesis via formal verification | arXiv: 2505.15960
efficient process reward modeling via contrastive mutual information | arXiv: 2604.10660
efficient test-time scaling via temporal reasoning aggregation | arXiv: 2604.17304
efficient training for cross-lingual speech language models | arXiv: 2604.11096
eliciting medical reasoning with knowledge-enhanced data synthesis a semi-superv | arXiv: 2604.11547
enabling agents to communicate entirely in latent space | arXiv: 2511.09149
end-to-end optimization of llm-driven multi-agent search systems via heterogeneo | arXiv: 2506.02718
enhancing hallucination detection via future context | arXiv: 2507.20546
enhancing linguistic competence of language models through pre-training with lan | arXiv: 2601.03448
enhancing llm-based search agents via contribution weighted group relative polic | arXiv: 2604.14267
enhancing multilingual rag systems with debiased language preference-guided quer | arXiv: 2601.02956
enhancing multimodal large language models for ancient chinese character evoluti | arXiv: 2604.11299
errorradar benchmarking complex mathematical reasoning of multimodal large langu | arXiv: 2410.04509
establishing a scale for kullback-leibler divergence in language models across v | arXiv: 2505.15353
ethicmind a risk-aware framework for ethical-emotional alignment in multi-turn d | arXiv: 2604.09265
evaluating memory capability in continuous lifelog scenario | arXiv: 2604.11182
evian towards explainable visual instruction-tuning data auditing | arXiv: 2604.20544
evoedit evolving null-space alignment for robust and efficient knowledge editing | arXiv: 2510.13851
evolutionary negative module pruning for better lora merging | arXiv: 2604.17753
evospark endogenous interactive agent societies for unified long-horizon narrati | arXiv: 2604.12776
expect the unexpected testing the surprisal of salient entities | arXiv: 2604.10724
experiments or outcomes probing scientific feasibility in large language models | arXiv: 2604.18786
explain the flag contextualizing hate speech beyond censorship | arXiv: 2604.14970
explicit trait inference for multi-agent coordination | arXiv: 2604.19278
exploring continual fine-tuning for enhancing language ability in large language | arXiv: 2410.16006
exploring the capability boundaries of llms in mastering of chinese chouxiang la | arXiv: 2604.15841
expseek self-triggered experience seeking for web agents | arXiv: 2601.08605
fable fine-grained fact anchoring for unstructured model editing | arXiv: 2604.12559
facts table summarization via offline template generation with agentic workflows | arXiv: 2510.13920
failure modes in multi-hop qa the weakest link effect and the recognition bottle | arXiv: 2601.12499
fairqe multi-agent framework for mitigating gender bias in translation quality e | arXiv: 2604.21420
faith factuality alignment through integrating trustworthiness and honestness | arXiv: 2604.10189
faithful-first reasoning planning and acting for multimodal llms | arXiv: 2511.08409
faithfulness vs safety evaluating llm behavior under counterfactual medical evid | arXiv: 2601.11886
faithlens detecting and explaining faithfulness hallucination | arXiv: 2512.20182
fastdiss few-step match many-step diffusion language model on sequence-to-sequen | arXiv: 2604.05551
fastkv decoupling of context reduction and kv cache compression for prefill-deco | arXiv: 2502.01068
fedgui benchmarking federated gui agents across heterogeneous platforms devices | arXiv: 2604.14956
feedback adaptation for retrieval-augmented generation | arXiv: 2604.06647
feedback-driven tool-use improvements in large language models via automated bui | arXiv: 2508.08791
finch benchmarking finance amp accounting across spreadsheet-centric enterprise | arXiv: 2512.13168
find your optimal teacher personalized data synthesis via router-guided multi-te | arXiv: 2510.10925
finesteer a unified framework for fine-grained inference-time steering in large | arXiv: 2604.15488
flare task-agnostic embedding model evaluation through a normalization process | arXiv: 2604.17344
flexguard continuous risk scoring for strictness-adaptive llm content moderation | arXiv: 2602.23636
follow the flow on information flow across textual tokens in text-to-image model | arXiv: 2504.01137
foresight optimization for strategic reasoning in large language models | arXiv: 2604.13592
forest before trees latent superposition for efficient visual reasoning | arXiv: 2601.06803
forget what matters keep the rest selective unlearning of informative tokens | arXiv: 2604.17785
frame of reference addressing the challenges of common ground representation in | arXiv: 2601.09365
frankentext stitching random text fragments into long-form narratives | arXiv: 2505.18128
fregelogic at semeval 2026 task 11 a hybrid neuro-symbolic architecture for cont | arXiv: 2604.18328
from answers to arguments toward trustworthy clinical diagnostic reasoning with | arXiv: 2604.11137
from charts to code a hierarchical benchmark for multimodal models | arXiv: 2510.17932
from domains to instances dual-granularity data synthesis for llm unlearning | arXiv: 2601.04278
from experience to skill multi-agent generative engine optimization via reusable | arXiv: 2604.19516
from heads to neurons causal attribution and steering in multi-task vision-langu | arXiv: 2604.17941
from if-statements to ml pipelines revisiting bias in code-generation | arXiv: 2604.21716
from inheritance to saturation disentangling the evolution of visual redundancy | arXiv: 2604.16462
from nodes to narratives explaining graph neural networks with llms and graph co | arXiv: 2508.07117
from passive metric to active signal the evolving role of uncertainty quantifica | arXiv: 2601.15690
from past to path masked history learning for next-item prediction in generative | arXiv: 2509.23649
from query to counsel structured reasoning with a multi-agent framework and data | arXiv: 2604.10470
from recall to forgetting benchmarking long-term memory for personalized agents | arXiv: 2604.20006
from relevance to authority authority-aware generative retrieval in web search e | arXiv: 2604.13468
from signal degradation to computation collapse uncovering the two failure modes | arXiv: 2604.19884
from static inference to dynamic interaction a survey of streaming large languag | arXiv: 2603.04592
from verbatim to gist distilling pyramidal multimodal memory via semantic inform | arXiv: 2603.01455
from weights to activations is steering the next frontier of adaptation | arXiv: 2604.14090
fs-researcher test-time scaling for long-horizon research tasks with file-system | arXiv: 2602.01566
gambit a gamified jailbreak framework for multimodal large language models | arXiv: 2601.03416
gameplayqa a benchmarking framework for decision-dense pov-synced multi-video un | arXiv: 2603.24329
ganitllm difficulty-aware bengali mathematical reasoning through curriculum-grpo | arXiv: 2601.06767
generalizable prompt tuning for audio-language models via semantic expansion | arXiv: 2601.20867
generating attribution reports for manipulated facial images a dataset and basel | arXiv: 2412.19685
generating effective cot traces for mitigating causal hallucination | arXiv: 2604.12748
geora geometry-aware low-rank adaptation for rlvr | arXiv: 2601.09361
georc a benchmark for geolocation reasoning chains | arXiv: 2601.21278
gigacheck detecting llm-generated content via object-centric span localization | arXiv: 2410.23728
graph-based alternatives to llms for human simulation | arXiv: 2511.02135
grasprune global gating for budgeted structured pruning of large language models | arXiv: 2604.19398
grass gradient-based adaptive layer-wise importance sampling for memory-efficien | arXiv: 2604.07808
hag hierarchical demographic tree-based agent generation for topic-adaptive simu | arXiv: 2601.05656
halluaudio a comprehensive benchmark for hallucination detection in large audio- | arXiv: 2604.19300
hard to be heard phoneme-level asr analysis of phonologically complex low-resour | arXiv: 2604.18204
harpo hierarchical agentic reasoning for user-aligned conversational recommendat | arXiv: 2604.10048
hcfd a benchmark for audio deepfake detection in healthcare | arXiv: 2604.17642
hcre llm-based hierarchical classification for cross-document relation extractio | arXiv: 2604.07937
healing entropy collapse enhancing exploration in few-shot rlvr via hybrid-domai | arXiv: 2604.17928
hela-mem hebbian learning and associative memory for llm agents | arXiv: 2604.16839
hermes kv cache as hierarchical memory for efficient streaming video understandi | arXiv: 2601.14724
heterocache a dynamic retrieval approach to heterogeneous kv cache compression f | arXiv: 2601.13684
hierarchical policy optimization for simultaneous translation of unbounded speec | arXiv: 2604.21045
hierarchical reinforcement learning with augmented step-level transitions for ll | arXiv: 2604.05808
higmem a hierarchical and llm-guided memory system for long-term conversational | arXiv: 2604.18349
hiprune hierarchical attention for efficient token pruning in vision-language mo | arXiv: 2508.00553
histlens mapping idea change across concepts and corpora | arXiv: 2604.11749
horizon a benchmark for in-the-wild user behaviour modeling | arXiv: 2604.17259
how adversarial environments mislead agentic ai | arXiv: 2604.18874
how do answer tokens read reasoning traces self-reading patterns in thinking llm | arXiv: 2604.19149
how hypocritical is your llm judge listener-speaker asymmetries in the pragmatic | arXiv: 2604.15873
how language models conflate logical validity with plausibility a representation | arXiv: 2510.06700
how retrieved context shapes internal representations in rag | arXiv: 2602.20091
how should we enhance the safety of large reasoning models an empirical study | arXiv: 2505.15404
humanllm benchmarking and improving llm anthropomorphism via human cognitive pat | arXiv: 2601.10198
hybrid-vector retrieval for visually rich documents combining single-vector effi | arXiv: 2510.22215
hypehr hyperbolic modeling of electronic health records for efficient question a | arXiv: 2604.21027
icebreaker for conversational agents breaking the first-message barrier with per | arXiv: 2604.18375
idea an interpretable and editable decision-making framework for llms via verbal | arXiv: 2604.12573
idiom understanding as a tool to measure the dialect gap | arXiv: 2510.05026
impact importance-aware activation space reconstruction | arXiv: 2507.03828
imperfectly cooperative human-ai interactions comparing the impacts of human and | arXiv: 2604.15607
implicitmembench measuring unconscious behavioral adaptation in large language m | arXiv: 2604.08064
imprif stronger implicit reasoning leads to better complex instruction following | arXiv: 2602.21228
improving the throughput of diffusion-based large language models via a training | arXiv: 2512.07173
indic-codecfake meets satyam towards detecting neural audio codec synthesized sp | arXiv: 2604.19949
indotabvqa a benchmark for cross-lingual table understanding in bahasa indonesia | arXiv: 2604.11970
inflated excellence or true performance rethinking medical diagnostic benchmarks | arXiv: 2510.09275
interpretability from the ground up stakeholder-centric design of automated scor | arXiv: 2511.17069
interpretable traces unexpected outcomes investigating the disconnect in trace-b | arXiv: 2505.13792
into the gray zone domain contexts can blur llm safety boundaries | arXiv: 2604.15717
investigating counterfactual unfairness in llms towards identities through humor | arXiv: 2604.18729
is agentic rag worth it an experimental comparison of rag approaches | arXiv: 2601.07711
is this chart lying to me automating the detection of misleading visualizations | arXiv: 2508.21675
it39s high time a survey of temporal question answering | arXiv: 2505.20243
itag inverse design for natural text generation with accurate causal graph annot | arXiv: 2604.06902
iterative formalization and planning in partially observable environments | arXiv: 2505.13126
jailbreaking large language models with morality attacks | arXiv: 2604.17053
jamendo-mt-qa a benchmark for multi-track comparative music question answering | arXiv: 2604.09721
jtpro a joint tool-prompt reflective optimization framework for language agents | arXiv: 2604.19821
judgemenot personalizing large language models to emulate judicial reasoning in | arXiv: 2604.18041
just use xml revisiting joint translation and label projection | arXiv: 2603.12021
know thy enemy securing llms against prompt injection via diverse data synthesis | arXiv: 2601.04666
koco conditioning language model pre-training on knowledge coordinates | arXiv: 2604.12397
koco-bench can large language models leverage domain knowledge in software devel | arXiv: 2601.13240
lami augmenting large language models via late multi-image fusion | arXiv: 2406.13621
language model as planner and formalizer under constraints | arXiv: 2510.05486
language models entangle language and culture | arXiv: 2601.15337
language on demand knowledge at core composing llms with encoder-decoder transla | arXiv: 2603.17512
language reconstruction with brain predictive coding from fmri data | arXiv: 2405.11597
language-coupled reinforcement learning for multilingual retrieval-augmented gen | arXiv: 2601.14896
large language models are bad dice players llms struggle to generate random numb | arXiv: 2601.05414
large reasoning models are not yet multilingual latent reasoners | arXiv: 2601.02996
latent-condensed transformer for efficient long context modeling | arXiv: 2604.12452
learning dynamic representations and policies from multimodal clinical time-seri | arXiv: 2604.21235
learning invariant modality representation for robust multimodal learning from a | arXiv: 2604.18460
learning to edit knowledge via instruction-based chain-of-thought prompting | arXiv: 2604.05540
learning to extract rational evidence via reinforcement learning for retrieval-a | arXiv: 2507.15586
learning to retrieve user history and generate user profiles for personalized pe | arXiv: 2601.05654
learning uncertainty from sequential internal dispersion in large language model | arXiv: 2604.15741
leave my images alone preventing multi-modal large language models from analyzin | arXiv: 2604.09024
leprec reasoning as classification over structured factors for assessing relevan | arXiv: 2604.19464
less noise more voice reinforcement learning for reasoning via instruction purif | arXiv: 2601.21244
lexrel benchmarking legal relation extraction for chinese civil cases | arXiv: 2512.12643
lightweight llm agent memory with small language models | arXiv: 2604.07798
llm prompt duel optimizer efficient label-free prompt optimization | arXiv: 2510.13907
llm-guided semantic bootstrapping for interpretable text classification with tse | arXiv: 2604.12223
llms underperform graph-based parsers on supervised relation extraction for comp | arXiv: 2604.08752
location not found exposing implicit local and global biases in multilingual llm | arXiv: 2604.19292
logical phase transitions understanding collapse in llm logical reasoning | arXiv: 2601.02902
logiceval a systematic framework for evaluating automated repair techniques for | arXiv: 2604.12994
logoskg hardware-optimized scalable and interpretable knowledge graph retrieval | arXiv: 2604.18913
look twice before you leap a rational framework for localized adversarial anonym | arXiv: 2512.06713
lora on the go instance-level dynamic lora selection and merging | arXiv: 2511.07129
losses that cook topological optimal transport for structured recipe generation | arXiv: 2601.02531
lost in diffusion uncovering hallucination patterns and failure modes in diffusi | arXiv: 2604.10556
lost in the prompt order revealing the limitations of causal attention in langua | arXiv: 2601.14152
lost in translation do lvlm judges generalize across languages | arXiv: 2604.19405
lpo towards accurate gui agent interaction via location preference optimization | arXiv: 2506.09373
lqm linguistically motivated multidimensional quality metrics for machine transl | arXiv: 2604.18490
mab-dqa addressing query aspect importance in document question answering with m | arXiv: 2604.08952
made a living benchmark for multi-label text classification with uncertainty qua | arXiv: 2604.15203
maestro meta-learning adaptive estimation of scalarization trade-offs for reward | arXiv: 2601.07208
making mllms blind adversarial smuggling attacks in mllm content moderation | arXiv: 2604.06950
march evaluating the intersection of ambiguity interpretation and multi-hop infe | arXiv: 2509.22750
march multi-agent radiology clinical hierarchy for ct report generation | arXiv: 2604.16175
mars2 scaling multi-agent tree search via reinforcement learning for code genera | arXiv: 2604.14564
mash evading black-box ai-generated text detectors via style humanization | arXiv: 2601.08564
masked by consensus disentangling privileged knowledge in llm correctness | arXiv: 2604.12373
mass-rag multi-agent synthesis retrieval-augmented generation | arXiv: 2604.18509
mata multi-agent framework for reliable and flexible table question answering | arXiv: 2602.09642
mathagent adversarial evolution of constraint graphs for mathematical reasoning | arXiv: 2604.11188
mathflow enhancing the perceptual flow of mllms for visual mathematical problems | arXiv: 2503.16549
maximizing local entropy where it matters prefix-aware localized llm unlearning | arXiv: 2601.03190
mcga a multi-task classical chinese literary genre audio corpus | arXiv: 2601.09270
mcp-flow facilitating llm agents to master real-world diverse and scaling mcp to | arXiv: 2510.24284
meashalu mitigation of scientific measurement hallucinations for large language | arXiv: 2604.16929
measuring what matters assessing therapeutic principles in mental-health convers | arXiv: 2604.05795
medlaybench-v a large-scale benchmark for expert-lay semantic alignment in medic | arXiv: 2604.05738
mem2evolve towards self-evolving agents via co-evolutionary capability expansion | arXiv: 2604.10923
memophishagent memory-augmented multi-modal llm agent for phishing url detection | arXiv: 2602.21394
memory-augmented llm-based multi-agent system for automated feature generation o | arXiv: 2604.20261
memp exploring agent procedural memory | arXiv: 2508.06433
meta-tool efficient few-shot tool adaptation for small language models | arXiv: 2604.20148
mhsafeeval role-aware interaction-level evaluation of mental health safety in la | arXiv: 2604.17730
min-k sampling decoupling truncation from temperature scaling via relative logit | arXiv: 2604.11012
mina a multilingual llm-powered legal assistant agent for bangladesh for empower | arXiv: 2511.08605
mitigating catastrophic forgetting in target language adaptation of llms via sou | arXiv: 2512.04844
mitigating extrinsic gender bias for bangla classification tasks | arXiv: 2411.10636
mitigating hallucinations in large vision-language models without performance de | arXiv: 2604.20366
mmerror a benchmark for erroneous reasoning in vision-language models | arXiv: 2601.03331
model internal sleuthing finding lexical identity and inflectional features in m | arXiv: 2506.02132
model-agnostic meta learning for class imbalance adaptation | arXiv: 2604.18759
modeling multi-dimensional cognitive states in large language models under cogni | arXiv: 2604.17174
moneta multimodal industry classification through geographic information with mu | arXiv: 2604.07956
more than meets the eye measuring the semiotic gap in vision-language models via | arXiv: 2604.17354
morphogen a multilingual benchmark for evaluating gender-aware morphological gen | arXiv: 2604.18914
mtr-duplexbench towards a comprehensive evaluation of multi-round conversations | arXiv: 2511.10262
muldimif a multi-dimensional constraint framework for evaluating and improving i | arXiv: 2505.07591
multi-drafter speculative decoding with alignment feedback | arXiv: 2604.05417
multi-faceted self-consistent preference alignment for query rewriting in conver | arXiv: 2604.06771
multi-task reinforcement learning for enhanced multimodal llm-as-a-judge | arXiv: 2603.11665
multi-view attention multiple-instance learning enhanced by llm reasoning for co | arXiv: 2509.17292
multifiletest a multi-file-level llm unit test generation benchmark and impact o | arXiv: 2502.06556
multilingual language models encode script over linguistic structure | arXiv: 2604.05090
multimodal in-context learning for asr of low-resource languages | arXiv: 2601.05707
music audio-visual question answering requires specialized multimodal designs | arXiv: 2505.20638
musical score understanding benchmark evaluating large language models39 compreh | arXiv: 2511.20697
native hybrid attention for efficient sequence modeling | arXiv: 2510.07019
no one fits all from fixed prompting to learned routing in multilingual llms | arXiv: 2604.16937
no-worse context-aware decoding preventing neutral regression in context-conditi | arXiv: 2604.16686
nose neural olfactory-semantic embedding with tri-modal orthogonal contrastive l | arXiv: 2604.10452
not all animals are equal metaphorical framing through source domains and semant | arXiv: 2604.20454
octotools an agentic framework with extensible tools for complex reasoning | arXiv: 2502.11271
odutqa-mdc a task for open-domain underspecified tabular qa with multi-turn dial | arXiv: 2604.10159
omibench benchmarking olympiad-level multi-image reasoning in large vision-langu | arXiv: 2604.20806
omni-embed-audio leveraging multimodal llms for robust audio-text retrieval | arXiv: 2604.18360
omnicompliance-100k a multi-domain rule-grounded real-world safety compliance da | arXiv: 2603.13933
omnidiagram advancing unified diagram code generation via visual interrogation r | arXiv: 2604.05514
on safety risks in experience-driven self-evolving agents | arXiv: 2604.16968
on the step length confounding in llm reasoning data selection | arXiv: 2604.06834
one persona many cues different results how sociodemographic cues impact llm per | arXiv: 2601.18572
optimizing user profiles via contextual bandits for retrieval-augmented llm pers | arXiv: 2601.12078
oscbench benchmarking object state change in text-to-video generation | arXiv: 2603.11698
parallel test-time scaling for latent reasoning models | arXiv: 2510.07745
parallel universes parallel languages a comprehensive study on llm-based multili | arXiv: 2601.00263
persona-e2 a human-grounded dataset for personality-shaped emotional responses t | arXiv: 2604.09162
personalized benchmarking evaluating llms by individual preferences | arXiv: 2604.18943
piarena a platform for prompt injection evaluation | arXiv: 2604.08499
planning beyond text graph-based reasoning for complex narrative generation | arXiv: 2604.21253
please refuse to answer me mitigating over-refusal in large language models via | arXiv: 2604.17132
policyllm towards excellent comprehension of public policy for large language mo | arXiv: 2604.12995
polynomial expansion rank adaptation enhancing low-rank fine-tuning with high-or | arXiv: 2604.11841
position multimodal large language models can significantly advance scientific r | arXiv: 2502.02871
precise debugging benchmark is your model debugging or regenerating | arXiv: 2604.17338
preference estimation via opponent modeling in multi-agent negotiation | arXiv: 2604.15687
prefix parsing is just parsing | arXiv: 2604.21191
principlismqa a philosophy-grounded approach to assessing llm-human clinical med | arXiv: 2508.05132
probing for reading times | arXiv: 2604.18712
process reward models meet planning generating precise and scalable datasets for | arXiv: 2604.17957
prosody as supervision bridging the non-verbal--verbal for multilingual speech e | arXiv: 2604.17647
protecting bystander privacy via selective hearing in audio llms | arXiv: 2512.06380
protocycle reflective tool-augmented planning for text-guided protein design | arXiv: 2604.16896
pseudo2real task arithmetic for pseudo-label correction in automatic speech reco | arXiv: 2510.08047
purging the gray zone latent-geometric denoising for precise knowledge boundary | arXiv: 2604.14324
pv-sql synergizing database probing and rule-based verification for text-to-sql | arXiv: 2604.17653
qimeng-prepair precise code repair via edit-aware reward optimization | arXiv: 2604.05963
quality over clicks intrinsic quality-driven iterative reinforcement learning fo | arXiv: 2603.22922
query pipeline optimization for cancer patient question answering systems | arXiv: 2412.14751
ra-rrg multimodal retrieval-augmented radiology report generation with key phras | arXiv: 2504.07415
racer retrieval-augmented contextual rapid speculative decoding | arXiv: 2604.14885
rads reinforcement learning-based sample selection improves transfer learning in | arXiv: 2604.20256
rare redundancy-aware retrieval evaluation framework for high-similarity corpora | arXiv: 2604.19047
reason only when needed efficient generative reward modeling via model-internal | arXiv: 2604.10072
reasonembed enhanced text embeddings for reasoning-intensive document retrieval | arXiv: 2510.08252
reasoning fails where step flow breaks | arXiv: 2604.06695
reasoning hijacking the fragility of reasoning alignment in large language model | arXiv: 2601.10294
reasoning-based refinement of unsupervised text clusters with llms | arXiv: 2604.07562
recoqa a benchmark for tool-augmented and multi-step reasoning in real estate qu | arXiv: 2604.17944
referee reference-free and fine-grained method for evaluating factual consistenc | arXiv: 2604.10520
region-grounded report generation for 3d medical imaging a fine-grained dataset | arXiv: 2604.18145
region-r1 reinforcing query-side region cropping for multi-modal re-ranking | arXiv: 2604.05268
reinforced efficient reasoning via semantically diverse exploration | arXiv: 2601.05053
reliable evaluation protocol for low-precision retrieval | arXiv: 2508.03306
render-of-thought rendering textual chain-of-thought as images for visual latent | arXiv: 2601.14750
reposhapley shapley-enhanced context filtering for repository-level code complet | arXiv: 2601.03378
representation-guided parameter-efficient llm unlearning | arXiv: 2604.17396
reprompt recurrent prompt tuning for integrating structured ehr encoders with la | arXiv: 2604.17725
rerec reasoning-augmented llm-based recommendation assistant via reinforcement f | arXiv: 2604.07851
researchbench benchmarking llms in scientific discovery via inspiration-based ta | arXiv: 2503.21248
rethinking jailbreak detection of large vision language models with representati | arXiv: 2512.12069
rethinking llm watermark detection in black-box settings a non-intrusive third-p | arXiv: 2603.14968
rethinking meeting effectiveness a benchmark and framework for temporal fine-gra | arXiv: 2604.17260
retraceqa evaluating reasoning traces of small language models in commonsense qu | arXiv: 2510.09351
retrievals can be detrimental unveiling the backdoor vulnerability of retrieval- | arXiv: 2501.13340
retrieving to recover towards incomplete audio-visual question answering via sem | arXiv: 2604.10695
reverse constitutional ai a framework for controllable toxic data generation via | arXiv: 2604.17769
revisiting entropy in reinforcement learning for large reasoning models | arXiv: 2511.05993
revisiting non-verbatim memorization in large language models the role of entity | arXiv: 2604.21882
revisiting the uniform information density hypothesis in llm reasoning | arXiv: 2510.06953
revitalizing black-box interpretability actionable interpretability for llms via | arXiv: 2505.12509
reward modeling for scientific writing evaluation | arXiv: 2601.11374
rhetorical questions in llm representations a linear probing study | arXiv: 2604.14128
right at my level a unified multilingual framework for proficiency-aware text si | arXiv: 2604.05302
risk a framework for gui agents in e-commerce risk management | arXiv: 2509.21982
ritek a dataset for large language models complex reasoning over textual knowled | arXiv: 2410.13987
river-llm large language model seamless exit based on kv share | arXiv: 2604.18396
rl-plus countering capability boundary collapse of llms in reinforcement learnin | arXiv: 2508.00222
robust tool use via fission-grpo learning to recover from execution errors | arXiv: 2601.15625
robustness via referencing defending against prompt injection attacks by referen | arXiv: 2504.20472
roleconflictbench a benchmark of role conflict scenarios for evaluating llms39 c | arXiv: 2509.25897
route to rome attack directing llm routers to expensive models via adversarial s | arXiv: 2604.15022
s2h-dpo hardness-aware preference optimization for vision-language models | arXiv: 2604.18512
saber an efficient sampling with adaptive acceleration and backtracking enhanced | arXiv: 2510.18165
safemerge preserving safety alignment in fine-tuned large language models via se | arXiv: 2503.17239
safetyalfred evaluating safety-conscious planning of multimodal large language m | arXiv: 2604.19638
sage sign-adaptive gradient for memory-efficient llm optimization | arXiv: 2604.07663
samora semantic-aware mixture of lora experts for task-adaptive learning | arXiv: 2604.19048
savoir learning social savoir-faire via shapley-based reward attribution | arXiv: 2604.18982
scaling behaviors of llm reinforcement learning post-training an empirical study | arXiv: 2509.25300
scaling external knowledge input beyond context windows of llms via multi-agent | arXiv: 2505.21471
scaling test-time compute to achieve ioi gold medal with open-weight models | arXiv: 2510.14232
scicoqa quality assurance for scientific paper--code alignment | arXiv: 2601.12910
sciimpact a multi-dimensional multi-field benchmark for scientific impact predic | arXiv: 2604.17141
script a subcharacter compositional representation injection module for korean p | arXiv: 2604.12377
scripts through time a survey of the evolving role of transliteration in nlp | arXiv: 2604.18722
sculpting the vector space towards efficient multi-vector visual document retrie | arXiv: 2602.19549
scurank ranking multiple candidate summaries with summary content units for enha | arXiv: 2604.19185
securevibebench evaluating secure coding capabilities of code agents with realis | arXiv: 2509.22097
seeing no evil blinding large vision-language models to safety instructions via | arXiv: 2604.10299
selar selective latent reasoning in large language models | arXiv: 2604.08299
self-awareness before action mitigating logical inertia via proactive cognitive | arXiv: 2604.20413
self-consistency from only two samples cot-pot ensembling for efficient llm reas | arXiv: 2604.17433
self-correcting text-to-video generation with misalignment detection and localiz | arXiv: 2411.15115
self-reinforcing controllable synthesis of rare relational data via bayesian cal | arXiv: 2604.16817
semantic-aware logical reasoning via a semiotic framework | arXiv: 2509.24765
semantic-space exploration and exploitation in rlvr for llm reasoning | arXiv: 2509.23808
semi-supervised diseased detection from speech dialogues with multi-level data m | arXiv: 2601.04744
sense and sensitivity examining the influence of semantic recall on long context | arXiv: 2505.13353
serm self-evolving relevance model with agent-driven learning from massive query | arXiv: 2601.09515
sessionintentbench a multi-task inter-session intention-shift modeling benchmark | arXiv: 2507.20185
sftmix elevating language model instruction tuning with mixup recipe | arXiv: 2410.05248
silo-bench a scalable environment for evaluating distributed coordination in mul | arXiv: 2603.01045
similarity-distance-magnitude activations | arXiv: 2509.12760
slideagent hierarchical agentic framework for multi-page visual document underst | arXiv: 2510.26615
socia-evo automated simulator construction via dual-anchored bi-level optimizati | arXiv: 2604.17351
soft head selection for injecting icl-derived task embeddings | arXiv: 2507.20906
solidcoder bridging the mental-reality gap in llm code generation through concre | arXiv: 2604.19825
solver-independent automated problem formulation via llms for high-cost simulati | arXiv: 2512.18682
spagbias uncovering and tracing structured spatial gender bias in large language | arXiv: 2604.14672
spasm stable persona-driven agent simulation for multi-turn dialogue generation | arXiv: 2604.09212
speakersleuth can large audio-language models judge speaker consistency across m | arXiv: 2601.04029
spec-o3 a tool-augmented vision-language agent for rare celestial object candida | arXiv: 2601.06498
specbound adaptive bounded self-speculation with layer-wise confidence calibrati | arXiv: 2604.12247
speculative verification exploiting information gain to refine speculative decod | arXiv: 2509.24328
spence a syntactic probe for detecting contamination in nl2sql benchmarks | arXiv: 2604.17771
spiralthinker latent reasoning through an iterative process with text-latent int | arXiv: 2511.08983
splits flexible sociocultural linguistic investigation at scale | arXiv: 2504.04640
spotlight and shadow attention-guided dual-anchor introspective decoding for mll | arXiv: 2604.10071
stable on-policy distillation through adaptive target reformulation | arXiv: 2601.07155
stable-rag mitigating retrieval-permutation-induced hallucinations in retrieval- | arXiv: 2601.02993
star-teaming a strategy-response multiplex network approach to automated llm red | arXiv: 2604.18976
step-grpo internalizing dynamic early exit for efficient reasoning | arXiv: 2604.16890
still between us evaluating and improving voice assistant robustness to third-pa | arXiv: 2604.17358
stk-adapter incorporating evolving graph and event chain for temporal knowledge | arXiv: 2604.19042
storycoder narrative reformulation for structured reasoning in llm code generati | arXiv: 2604.14631
stresstest can your speech lm handle the stress | arXiv: 2505.22765
stride-ed a strategy-grounded stepwise reasoning framework for empathetic dialog | arXiv: 2604.07100
structkv preserving the structural skeleton for scalable long-context inference | arXiv: 2604.06746
structmem structured memory for long-horizon behavior in llms | arXiv: 2604.21748
style amnesia investigating speaking style degradation and mitigation in multi-t | arXiv: 2512.23578
style over story measuring llm narrative preferences via structured selection | arXiv: 2510.02025
subject-level inference for realistic text anonymization evaluation | arXiv: 2604.21211
supplement generation training for enhancing agentic task performance | arXiv: 2604.20727
syntax as a rosetta stone universal dependencies for in-context coptic translati | arXiv: 2604.18758
synthagent adapting web agents with synthetic supervision | arXiv: 2511.06101
synthia scalable grounded persona generation from social media data | arXiv: 2507.14922
table question answering in the era of large language models a comprehensive sur | arXiv: 2510.09671
tabrex tabular referenceless explainable evaluation | arXiv: 2512.15907
taming actor-observer asymmetry in agents via dialectical alignment | arXiv: 2604.19548
targeted exploration via unified entropy control for reinforcement learning | arXiv: 2604.14646
task-aware llm routing with multi-level task-profile-guided data synthesis for c | arXiv: 2604.09377
task-stratified knowledge scaling laws for post-training quantized large languag | arXiv: 2508.18609
taxpraben a scalable benchmark for structured evaluation of llms in chinese real | arXiv: 2604.08948
tellwhisper tell whisper who speaks when | arXiv: 2601.03712
tema anchor the image follow the text for multi-modification composed image retr | arXiv: 2604.21806
template-assisted contrastive learning of task-oriented dialogue sentence embedd | arXiv: 2305.14299
temporal contrastive decoding a training-free method for large audio-language mo | arXiv: 2604.15383
temporal flattening in llm-generated text comparing human and llm writing trajec | arXiv: 2604.12097
temporal leakage in search-engine date-filtered web retrieval a retrospective fo | arXiv: 2602.00758
temporalvlm video llms for temporal reasoning in long videos | arXiv: 2412.02930
text-attributed knowledge graph enrichment with large language models for medica | arXiv: 2604.13331
text-to-distribution prediction with quantile tokens and neighbor context | arXiv: 2604.20216
the gaoyao benchmark a comprehensive framework for evaluating multilingual and m | arXiv: 2604.20225
the model agreed but didn39t learn diagnosing surface compliance in large langua | arXiv: 2604.05995
the path not taken duality in reasoning about program execution | arXiv: 2604.20917
the reasoning trap how enhancing llm reasoning amplifies tool hallucination | arXiv: 2510.22977
the stackelberg speaker optimizing persuasive communication in social deduction | arXiv: 2510.09087
think in latent thoughts a new paradigm for gloss-free sign language translation | arXiv: 2604.15301
think in sentences explicit sentence boundaries enhance language model39s capabi | arXiv: 2604.10135
think outside the policy in-context steered policy optimization | arXiv: 2510.26519
thinking like a botanist challenging multimodal language models with intent-driv | arXiv: 2604.20983
threadsumm summarization of nested discourse threads using tree of thoughts | arXiv: 2604.17648
through the magnifying glass adaptive perception magnification for hallucination | arXiv: 2503.10183
time-ra towards time series reasoning for anomaly diagnosis with llm feedback | arXiv: 2507.15066
tingis real-time risk event discovery from noisy customer incidents at enterpris | arXiv: 2604.21889
to lie or not to lie investigating the biased spread of global lies by llms | arXiv: 2604.06552
to trust or not to trust attention-based trust management for llm multi-agent sy | arXiv: 2506.02546
toolomni enabling open-world tool use via agentic learning with proactive retrie | arXiv: 2604.13787
topic-based watermarks for large language models | arXiv: 2404.02138
topology-aware layer pruning for large vision-language models | arXiv: 2604.16502
toward consistent world models with multi-token prediction and latent semantic e | arXiv: 2604.06155
towards bridging the reward-generation gap in direct alignment algorithms | arXiv: 2506.09457
towards effective in-context cross-domain knowledge transfer via domain-invarian | arXiv: 2604.05383
towards fine-grained and multi-granular contrastive language-speech pre-training | arXiv: 2601.03065
towards intrinsic interpretability of large language modelsa survey of design pr | arXiv: 2604.16042
towards proactive information probing customer service chatbots harvesting value | arXiv: 2604.11077
towards robust real-world spreadsheet understanding with multi-agent multi-forma | arXiv: 2604.12282
towards scalable lightweight gui agents via multi-role orchestration | arXiv: 2604.13488
towards self-improving error diagnosis in multi-agent systems | arXiv: 2604.17658
toxitrace gradient-aligned training for explainable chinese toxicity detection | arXiv: 2604.12321
toxreason a benchmark for mechanistic chemical toxicity reasoning via adverse ou | arXiv: 2604.06264
tpa next token probability attribution for detecting hallucinations in rag | arXiv: 2512.07515
tracing relational knowledge recall in large language models | arXiv: 2604.19934
training-free test-time contrastive learning for large language models | arXiv: 2604.13552
trajguard streaming hidden-state trajectory detection for decoding-time jailbrea | arXiv: 2604.07727
tree-of-evidence efficient 34system 234 search for faithful multimodal grounding | arXiv: 2604.07692
trigreason trigger-based collaboration between small and large reasoning models | arXiv: 2604.14847
trojail trajectory-level optimization for multi-turn large language model jailbr | arXiv: 2512.07761
two pathways to truthfulness on the intrinsic encoding of llm hallucinations | arXiv: 2601.07422
ucs estimating unseen coverage for improved in-context learning | arXiv: 2604.12015
ukp psycontrol at semeval-2026 task 2 modeling valence and arousal dynamics from | arXiv: 2604.21534
uncertainty quantification in llm agents foundations emerging challenges and opp | arXiv: 2602.05073
understanding and mitigating spurious signal amplification in test-time reinforc | arXiv: 2604.21327
understanding generalization in role-playing models via information theory | arXiv: 2512.17270
understanding new-knowledge-induced factual hallucinations in llms analysis and | arXiv: 2511.02626
understanding or memorizing a case study of german definite articles in language | arXiv: 2601.09313
understanding structured financial data with llms a case study on fraud detectio | arXiv: 2512.13040
unicreative unifying long-form logic and short-form sparkle via reference-free r | arXiv: 2604.05517
unleashing spatial reasoning in multimodal large language models via textual rep | arXiv: 2603.23404
unlocking the edge deployment and ondevice acceleration of multi-lora enabled on | arXiv: 2604.18655
vc-inspector advancing reference-free evaluation of video captions with factual | arXiv: 2509.16538
videostir understanding long videos via spatio-temporally structured and intent- | arXiv: 2604.05418
vill-e video llm embeddings for retrieval | arXiv: 2604.12148
visret visualization improves knowledge-intensive text-to-image retrieval | arXiv: 2505.20291
vista verification in sequential turn-based assessment | arXiv: 2510.27052
vla-forget vision-language-action unlearning for embodied foundation models | arXiv: 2604.03956
vln-nf feasibility-aware vision-and-language navigation with false-premise instr | arXiv: 2604.10533
vocab diet reshaping the vocabulary of llms via vector arithmetic | arXiv: 2510.17001
voxmind an end-to-end agentic spoken dialogue system | arXiv: 2604.15710
waking up blind cold-start optimization of supervision-free agentic trajectories | arXiv: 2604.17475
what do vision-language models encode for personalized image aesthetics assessme | arXiv: 2604.11374
what factors affect llms and rllms in financial question answering | arXiv: 2507.08339
what if consensus lies selective-complementary reinforcement learning at test ti | arXiv: 2603.19880
what makes an ideal quote recommending 34unexpected yet rational34 quotations vi | arXiv: 2602.22220
what makes an llm a good optimizer a trajectory analysis of llm-guided evolution | arXiv: 2604.19440
what makes llms effective sequential recommenders a study on preference intensit | arXiv: 2506.02261
what39s missing in screen-to-action towards a ui-in-the-loop paradigm for multim | arXiv: 2604.06995
when agents look the same quantifying distillation-induced similarity in tool-us | arXiv: 2604.21255
when bigger isn39t better a comprehensive fairness evaluation of political bias | arXiv: 2604.21309
when helpers become hazards a benchmark for analyzing multimodal llm-powered saf | arXiv: 2601.04043
when is thinking enough early exit via sufficiency assessment for efficient reas | arXiv: 2604.06787
when misinformation speaks and converses rethinking fact-checking in audio platf | arXiv: 2604.16767
when personalization tricks detectors the feature-inversion trap in machine-gene | arXiv: 2510.12476
when slower isn39t truer inverse scaling law of truthfulness in multimodal reaso | arXiv: 2505.20214
when vision-language models judge without seeing exposing informativeness bias | arXiv: 2604.17768
where and what reasoning dynamic and implicit preferences in situated conversati | arXiv: 2604.20749
which bird does not have wings negative-constrained kgqa with schema-guided sema | arXiv: 2604.14749
which reasoning trajectories teach students to reason better a simple metric of | arXiv: 2601.14249
who gets which message auditing demographic bias in llm-generated targeted text | arXiv: 2601.17172
who wrote this line evaluating the detection of llm-generated classical chinese | arXiv: 2604.10101
why agents compromise safety under pressure | arXiv: 2603.14975
why did apple fall evaluating curiosity in large language models | arXiv: 2510.20635
why do multilingual reasoning gaps emerge in reasoning language models | arXiv: 2510.27269
why supervised fine-tuning fails to learn a systematic study of incomplete learn | arXiv: 2604.10079
why these documents explainable generative retrieval with hierarchical category | arXiv: 2411.05572
wikiseeker rethinking the role of vision-language models in knowledge-based visu | arXiv: 2604.05818
wisca a lightweight model transition method to improve llm training via weight s | arXiv: 2508.16676
working memory constraints scaffold learning in transformers under data scarcity | arXiv: 2604.20789
xlsr-mambo scaling the hybrid mamba-attention backbone for audio deepfake detect | arXiv: 2601.02944
xmark reliable multi-bit watermarking for llm-generated texts | arXiv: 2604.05242
xoxo stealthy cross-origin context poisoning attacks against ai coding assistant | arXiv: 2503.14281
xq-meval a dataset with cross-lingual parallel quality for benchmarking translat | arXiv: 2604.14934
xtragpt context-aware and controllable academic paper revision via human-ai coll | arXiv: 2505.11336
yield a large-scale dataset and evaluation framework for information elicitation | arXiv: 2604.10968
your llm agents are temporally blind the misalignment between tool use decisions | arXiv: 2510.23853
zara training-free motion time-series reasoning via evidence-grounded llm agents | arXiv: 2508.04038
zipvoice-dialog non-autoregressive spoken dialogue generation with flow matching | arXiv: 2507.09318