ACL2026 论文笔记 TODO¶
总计: 680 篇 | 已完成: 680 | 待更新: 0
- 34excuse me may i say something34 colabscience a proactive ai assistant for biom | arXiv: 2604.15588
- a computational method for measuring 34open codes34 in qualitative analysis | arXiv: 2411.12142
- a layer-wise analysis of supervised fine-tuning | arXiv: 2604.11838
- a multilingual dataset and empirical validation for the mutual reinforcement eff | arXiv: 2407.10953
- a structured clustering approach for inducing media narratives | arXiv: 2604.10368
- a study of llms39 preferences for libraries and programming languages | arXiv: 2503.17181
- a survey of reinforcement learning for large language models under data scarcity | arXiv: 2604.17312
- a survey on mllm-based visually rich document understanding methods challenges a | arXiv: 2507.09861
- a unified framework for modeling heterogeneous financial data via dual-granulari | arXiv: 2404.13004
- abstain-r1 calibrated abstention and post-refusal clarification via verifiable r | arXiv: 2604.17073
- accelerating training of autoregressive video generation models via local optimi | arXiv: 2604.07402
- across programming language silos a study on cross-lingual retrieval-augmented c | arXiv: 2506.03535
- adam39s law textual frequency law on large language models | arXiv: 2604.02176
- Adaptive Instruction Composition for Automated LLM Red-Teaming | arXiv: 2604.21159
- adaptive layer selection for layer-wise token pruning in llm inference | arXiv: 2601.07667
- adaptive text anonymization learning privacy-utility trade-offs via prompt optim | arXiv: 2602.20743
- addressing overthinking in large vision-language models via gated perception-rea | arXiv: 2601.04442
- affectron emotional speech synthesis with affective and contextually aligned non | arXiv: 2603.14432
- AFMRL: Attribute-Enhanced Fine-Grained Multi-Modal Representation Learning in E-commerce | arXiv: 2604.20135
- agencybench benchmarking the frontiers of autonomous agents in 1m-token real-wor | arXiv: 2601.11044
- agent-gwo collaborative agents for dynamic prompt optimization in large language | arXiv: 2604.18612
- agentgl towards agentic graph learning with llms via reinforcement learning | arXiv: 2604.05846
- agentic conversational search with contextualized reasoning via reinforcement le | arXiv: 2601.13115
- agree disagree explain decomposing human label variation in nli through the lens | arXiv: 2510.16458
- agsc adaptive granularity and semantic clustering for uncertainty quantification | arXiv: 2604.06812
- aica-bench holistically examining the capabilities of vlms in affective image co | arXiv: 2604.05900
- aim-cot active information-driven multimodal chain-of-thought for vision-languag | arXiv: 2509.25699
- AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation | arXiv: 2604.18240
- alexandria a multi-domain dialectal arabic machine translation dataset for cultu | arXiv: 2601.13099
- aligning agents via planning a benchmark for trajectory-level reward modeling | arXiv: 2604.08178
- aligning language models with real-time knowledge editing | arXiv: 2508.01302
- aligning what llms do and say towards self-consistent explanations | arXiv: 2506.07523
- alignment data map for efficient preference data selection and diagnosis | arXiv: 2505.23114
- all changes may have invariant principles improving ever-shifting harmful meme d | arXiv: 2601.04567
- All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG | arXiv: 2604.20199
- alphacontext an evolutionary tree-based psychometric context generator for creat | arXiv: 2604.18398
- among us language of conspiracy theorists on mainstream reddit | arXiv: 2506.05086
- an existence proof for neural language models that can explain garden-path effec | arXiv: 2604.18293
- an exploration of mamba for speech self-supervised models | arXiv: 2506.12606
- an iterative utility judgment framework inspired by philosophical relevance via | arXiv: 2406.11290
- analytical ffn-to-moe restructuring via activation pattern analysis | arXiv: 2502.04416
- anchored cyclic generation a novel paradigm for long-sequence symbolic music gen | arXiv: 2604.05343
- anchormem anchored facts with associative contexts for building memory in large | arXiv: 2604.17377
- anchorseg language grounded query banks for reasoning segmentation | arXiv: 2604.18562
- anonpsy a graph-based framework for structure-preserving de-identification of ps | arXiv: 2601.13503
- apex-mem agentic semi-structured memory with temporal reasoning for long-term co | arXiv: 2604.14362
- are emotion and rhetoric neurons in llm neuron recognition and adaptive masking | arXiv: 2604.17255
- are large language models economically viable for industry deployment | arXiv: 2604.19342
- are they lovers or friends evaluating llms39 social reasoning in english and kor | arXiv: 2510.19028
- ark answer-centric retriever tuning via kg-augmented curriculum learning | arXiv: 2511.16326
- aroma augmented reasoning over a multimodal architecture for virtual cell geneti | arXiv: 2604.20263
- arrowgev grounding events in video via learning the arrow of time | arXiv: 2601.06559
- arxiv2table toward realistic benchmarking and evaluation for llm-based literatur | arXiv: 2504.10284
- atlas adaptive trading with llm agents through dynamic prompt optimization and m | arXiv: 2510.15949
- attnpo attention-guided process supervision for efficient reasoning | arXiv: 2602.09953
- attribution citation and quotation a survey of evidence-based text generation wi | arXiv: 2508.15396
- author-in-the-loop response generation and evaluation integrating author experti | arXiv: 2602.11173
- automatic combination of sample selection strategies for few-shot learning | arXiv: 2402.03038
- automatic slide updating with user-defined dynamic templates and natural languag | arXiv: 2604.17894
- autopkg an automated framework for dynamic e-commerce product-attribute knowledg | arXiv: 2604.16950
- autoreproduce automatic ai experiment reproduction with paper lineage | arXiv: 2505.20662
- bayesian active learning with gaussian processes guided by llm relevance scoring | arXiv: 2604.17906
- bayesian social deduction with graph-informed language models | arXiv: 2506.17788
- bd-tp self-supervised speech models discover phonological vector arithmetic | arXiv: 2602.18899
- benchmarking and enabling efficient chinese medical retrieval via asymmetric enc | arXiv: 2604.10937
- benchmarking deflection and hallucination in large vision-language models | arXiv: 2604.12033
- better and worse with scale how contextual entrainment diverges with model size | arXiv: 2604.13275
- beyond accuracy unveiling inefficiency patterns in tool-integrated reasoning | arXiv: 2604.05404
- beyond black-box interventions latent probing for faithful retrieval-augmented g | arXiv: 2510.12460
- beyond end-to-end dynamic chain optimization for private llm adaptation on the e | arXiv: 2604.06819
- beyond explicit refusals soft-failure attacks on retrieval-augmented generation | arXiv: 2604.18663
- beyond itinerary planning-a real-world benchmark for multi-turn and tool-using t | arXiv: 2512.22673
- beyond literal mapping benchmarking and improving non-literal translation evalua | arXiv: 2601.07338
- beyond marginal distributions a framework to evaluate the representativeness of | arXiv: 2601.15755
- beyond prompt fine-grained simulation of cognitively impaired standardized patie | arXiv: 2604.12210
- beyond reproduction a paired-task framework for assessing llm comprehension and | arXiv: 2604.18169
- beyond the final actor modeling the dual roles of creator and editor for fine-gr | arXiv: 2604.04932
- beyond the individual virtualizing multi-disciplinary reasoning for clinical int | arXiv: 2604.08927
- beyond transcription unified audio schema for perception-aware audiollms | arXiv: 2604.12506
- bhashasutra a task-centric unified survey of indian nlp datasets corpora and res | arXiv: 2604.18423
- biasedtales-ml a multilingual dataset for analyzing narrative attribute distribu | arXiv: 2604.17008
- biohicl hierarchical multi-label contrastive learning for biomedical retrieval w | arXiv: 2604.15591
- bizcompass benchmarking the reasoning capabilities of llms in business knowledge | arXiv: 2604.17305
- bookagent orchestrating safety-aware visual narratives via multi-agent cognitive | arXiv: 2604.16541
- bootstrapping code translation with weighted multilanguage exploration | arXiv: 2601.03512
- bosch black-box binary optimization for short-context attention-head selection i | arXiv: 2604.05942
- boundrl efficient structured text segmentation through reinforced boundary gener | arXiv: 2510.20151
- breaking block boundaries anchor-based history-stable decoding for diffusion lar | arXiv: 2604.08964
- bridging sft and rl dynamic policy optimization for robust reasoning | arXiv: 2604.08926
- budget-aware anytime reasoning with llm-synthesized preference data | arXiv: 2601.11038
- calibrated not for everyone how sexual orientation and religious markers distort | arXiv: 2604.17316
- calibrated speculative decoding frequency-guided candidate selection for efficie | arXiv: 2604.13634
- can ai-generated persuasion be detected persuaficial benchmark and ai vs human l | arXiv: 2601.04925
- can continual pre-training bridge the performance gap between general-purpose an | arXiv: 2604.19394
- cap controllable alignment prompting for unlearning in llms | arXiv: 2604.21251
- capabilities and evaluation biases of large language models in classical chinese | arXiv: 2510.15313
- caro chain-of-analogy reasoning optimization for robust content moderation | arXiv: 2604.10504
- cartbench evaluating vision-language models on chinese art understanding interpr | arXiv: 2604.11632
- cast achieving stable llm-based text analysis for data analytics | arXiv: 2602.15861
- causaldetox causal head selection and intervention for language model detoxifica | arXiv: 2604.14602
- cbrs cognitive blood request system with bilingual dataset and dual-layer filter | arXiv: 2604.16665
- ce-gppo coordinating entropy via gradient-preserving clipping policy optimizatio | arXiv: 2509.20712
- chain-of-thought as a lens evaluating structured reasoning alignment between hum | arXiv: 2511.06168
- chairo contextual hierarchical analogical induction and reasoning optimization f | arXiv: 2604.10502
- challenging the boundaries of reasoning an olympiad-level math benchmark for lar | arXiv: 2503.21380
- chathls towards systematic design automation and optimization for high-level syn | arXiv: 2507.00642
- chemamp amplified chemistry tools via composable agents | arXiv: 2505.21569
- chemvlr prioritizing reasoning in perception for chemical vision-language unders | arXiv: 2604.06685
- chipseek optimizing verilog generation via eda-integrated reinforcement learning | arXiv: 2507.04736
- chunqiutr time-keyed temporal retrieval in classical chinese annals | arXiv: 2604.06997
- ci-work benchmarking contextual integrity in enterprise llm agents | arXiv: 2604.21308
- cipo counterfactual unlearning for large reasoning models through iterative pref | arXiv: 2604.15847
- citeguard faithful citation attribution for llms via retrieval-augmented validat | arXiv: 2510.17853
- clag adaptive memory organization via agent-driven clustering for small language | arXiv: 2603.15421
- clare-ty amid chaos quantifying representational entanglement to predict ripple | arXiv: 2603.19297
- clewr curriculum learning with restarts for machine translation preference learn | arXiv: 2601.05858
- climatecause complex and implicit causal structures in climate reports | arXiv: 2604.14856
- closing the modality reasoning gap for speech large language models | arXiv: 2601.05543
- codepromptzip code-specific prompt compression for retrieval-augmented generatio | arXiv: 2502.14925
- coderl improving code generation via reinforcement with execution semantics alig | arXiv: 2510.18471
- codestruct code agents over structured action spaces | arXiv: 2604.05407
- codewiki evaluating ai39s ability to generate holistic documentation for large-s | arXiv: 2510.24428
- codial interpretable task-oriented dialogue systems through dialogue flow alignm | arXiv: 2506.02264
- coevolve training llm agents via agent-data mutual evolution | arXiv: 2604.15840
- coggen a cognitively inspired recursive framework for deep research report gener | arXiv: 2604.17072
- cognitive policy-driven llm for diagnosis and intervention of cognitive distorti | arXiv: 2604.17178
- collabcoder plan-code co-evolution via collaborative decision-making for efficie | arXiv: 2604.13946
- collaborative multi-agent scripts generation for enhancing imperfect-information | arXiv: 2604.11741
- common to whom regional cultural commonsense and llm bias in india | arXiv: 2601.15550
- commonsense knowledge with negation a resource to enhance negation understanding | arXiv: 2604.19921
- compact example-based explanations for language models | arXiv: 2601.03786
- comparing human and large language model interpretation of implicit information | arXiv: 2604.17085
- compiling activation steering into weights via null-space constraints for stealt | arXiv: 2604.12359
- compositional steering of large language models with steering tokens | arXiv: 2601.05062
- computational narrative understanding for expressive text-to-speech | arXiv: 2509.04072
- conjecture and inquiry quantifying software performance requirements via interac | arXiv: 2604.21380
- conjunctive prompt attacks in multi-agent llm systems | arXiv: 2604.16543
- conlangcrafter constructing languages with a multi-hop llm pipeline | arXiv: 2508.06094
- consistrm improving generative reward models via consistency-aware self-training | arXiv: 2604.07484
- content fuzzing for escaping information cocoons on digital social media | arXiv: 2604.05461
- context attribution with multi-armed bandit optimization | arXiv: 2506.19977
- context-value-action architecture for value-driven large language model agents | arXiv: 2604.05939
- contrastive decoding mitigates score range bias in llm-as-a-judge | arXiv: 2510.18196
- controlaudio tackling text-guided timing-indicated and intelligible audio genera | arXiv: 2510.08878
- controlling multimodal conversational agents with coverage-enhanced latent actio | arXiv: 2601.07516
- costomcausal-oriented steering for intrinsic theory-of-mind alignment in large l | arXiv: 2604.10031
- counterrefine answer-conditioned counterevidence retrieval for inference-time kn | arXiv: 2603.16091
- craft training-free cascaded retrieval for tabular qa | arXiv: 2505.14984
- creating conlangs to probe the metalinguistic grammatical knowledge of llms | arXiv: 2510.07591
- creditdecoding accelerating parallel decoding in diffusion large language models | arXiv: 2510.06133
- crisp compressing redundancy in chain-of-thought via intrinsic saliency pruning | arXiv: 2604.17297
- cross-modal taxonomic generalization in vision- language models | arXiv: 2603.07474
- cura clinical uncertainty risk alignment for language model-based risk predictio | arXiv: 2604.14651
- curate continual unlearning in real time with ensured preservation of llm knowle | arXiv: 2604.14644
- curing miracle steps in llm mathematical reasoning with rubric rewards | arXiv: 2510.07774
- dart mitigating harm drift in difference-aware llms via distill-audit-repair tra | arXiv: 2604.16845
- dash-kv accelerating long-context llm inference via asymmetric kv cache hashing | arXiv: 2604.19351
- data mixing agent learning to re-weight domains for continual pre-training | arXiv: 2507.15640
- de-anonymization at scale via tournament-style attribution | arXiv: 2601.12407
- debating the unspoken role-anchored multi-agent reasoning for half-truth detecti | arXiv: 2604.19005
- decisive guiding user decisions with optimal preference elicitation from unstruc | arXiv: 2604.18122
- decoupling the effect of chain-of-thought reasoning a human label variation pers | arXiv: 2601.03154
- decovec building decoding space based task vector for large language models via | arXiv: 2604.11129
- deepguard secure code generation via multi-layer semantic aggregation | arXiv: 2604.09089
- deepprune parallel scaling without inter-trace redundancy | arXiv: 2510.08483
- deliberative searcher improving llm reliability via reinforcement learning with | arXiv: 2507.16727
- detecting hallucinations in speechllms at inference time using attention maps | arXiv: 2604.19565
- detecting rag extraction attack via dual-path runtime integrity game | arXiv: 2604.10717
- detoxification for llm from dataset itself | arXiv: 2604.19124
- dia-harm dialectal disparities in harmful content detection across 50 english di | arXiv: 2604.05318
- dialectic-med mitigating diagnostic hallucinations via counterfactual adversaria | arXiv: 2604.11258
- diffusion-cam faithful visual explanations for dmllms | arXiv: 2604.11005
- disambiguation-centric finetuning makes enterprise tool-calling llms more realis | arXiv: 2507.03336
- discourse coherence and response-guided context rewriting for multi-party dialog | arXiv: 2604.06784
- discover and prove an open-source agentic framework for hard mode automated theo | arXiv: 2604.15839
- discovering a shared logical subspace steering llm logical reasoning via alignme | arXiv: 2604.19716
- dissecting failure dynamics in large language model reasoning | arXiv: 2604.14528
- distorted or fabricated a survey on hallucination in video llms | arXiv: 2604.12944
- diversity collapse in multi-agent llm systems structural coupling and collective | arXiv: 2604.18005
- diziner disagreement-guided instruction refinement via pilot annotation simulati | arXiv: 2604.15866
- do llms know tool irrelevance demystifying structural alignment bias in tool inv | arXiv: 2604.11322
- do llms overthink basic math reasoning benchmarking the accuracy-efficiency trad | arXiv: 2507.04023
- do not step into the same river twice learning to reason from trial and error | arXiv: 2510.26109
- do we need distinct representations for every speech token unveiling and exploit | arXiv: 2604.06871
- doc-pp document policy preservation benchmark for large vision-language models | arXiv: 2601.03926
- domain-specific data generation framework for rag adaptation | arXiv: 2510.11217
- don39t act blindly robust gui automation via action-effect verification and self | arXiv: 2604.05477
- don39t adapt small language models for tools adapt tool schemas to the models | arXiv: 2510.07248
- dpc training-free text-to-sql candidate selection via dual-paradigm consistency | arXiv: 2604.15163
- dqa diagnostic question answering for it support | arXiv: 2604.05350
- dr assistant enhancing clinical diagnostic inquiry via structured diagnostic rea | arXiv: 2601.13690
- duet dual execution for test output prediction with generated code and pseudocod | arXiv: 2604.11514
- dynamic emotion and personality profiling for multimodal deception detection | arXiv: 2604.17037
- dynamics of cognitive heterogeneity investigating behavioral biases in multi-sta | arXiv: 2604.17220
- e2e-gmner end-to-end generative grounded multimodal named entity recognition | arXiv: 2604.17319
- e2edev benchmarking large language models in end-to-end software development tas | arXiv: 2510.14509
- ea-agent a structured multi-step reasoning agent for entity alignment | arXiv: 2604.11686
- easy samples are all you need self-evolving llms via data-efficient reinforcemen | arXiv: 2604.18639
- eet experience-driven early termination for cost-efficient software engineering | arXiv: 2601.05777
- efficient and effective internal memory retrieval for llm-based healthcare predi | arXiv: 2604.07659
- efficient inference for large vision-language models bottlenecks techniques and | arXiv: 2604.05546
- efficient learned data compression via dual-stream feature decoupling | arXiv: 2604.07239
- efficient prm training data synthesis via formal verification | arXiv: 2505.15960
- efficient process reward modeling via contrastive mutual information | arXiv: 2604.10660
- efficient test-time scaling via temporal reasoning aggregation | arXiv: 2604.17304
- efficient training for cross-lingual speech language models | arXiv: 2604.11096
- eliciting medical reasoning with knowledge-enhanced data synthesis a semi-superv | arXiv: 2604.11547
- enabling agents to communicate entirely in latent space | arXiv: 2511.09149
- end-to-end optimization of llm-driven multi-agent search systems via heterogeneo | arXiv: 2506.02718
- enhancing hallucination detection via future context | arXiv: 2507.20546
- enhancing linguistic competence of language models through pre-training with lan | arXiv: 2601.03448
- enhancing llm-based search agents via contribution weighted group relative polic | arXiv: 2604.14267
- enhancing multilingual rag systems with debiased language preference-guided quer | arXiv: 2601.02956
- enhancing multimodal large language models for ancient chinese character evoluti | arXiv: 2604.11299
- errorradar benchmarking complex mathematical reasoning of multimodal large langu | arXiv: 2410.04509
- establishing a scale for kullback-leibler divergence in language models across v | arXiv: 2505.15353
- ethicmind a risk-aware framework for ethical-emotional alignment in multi-turn d | arXiv: 2604.09265
- evaluating memory capability in continuous lifelog scenario | arXiv: 2604.11182
- evian towards explainable visual instruction-tuning data auditing | arXiv: 2604.20544
- evoedit evolving null-space alignment for robust and efficient knowledge editing | arXiv: 2510.13851
- evolutionary negative module pruning for better lora merging | arXiv: 2604.17753
- evospark endogenous interactive agent societies for unified long-horizon narrati | arXiv: 2604.12776
- expect the unexpected testing the surprisal of salient entities | arXiv: 2604.10724
- experiments or outcomes probing scientific feasibility in large language models | arXiv: 2604.18786
- explain the flag contextualizing hate speech beyond censorship | arXiv: 2604.14970
- explicit trait inference for multi-agent coordination | arXiv: 2604.19278
- exploring continual fine-tuning for enhancing language ability in large language | arXiv: 2410.16006
- exploring the capability boundaries of llms in mastering of chinese chouxiang la | arXiv: 2604.15841
- expseek self-triggered experience seeking for web agents | arXiv: 2601.08605
- fable fine-grained fact anchoring for unstructured model editing | arXiv: 2604.12559
- facts table summarization via offline template generation with agentic workflows | arXiv: 2510.13920
- failure modes in multi-hop qa the weakest link effect and the recognition bottle | arXiv: 2601.12499
- fairqe multi-agent framework for mitigating gender bias in translation quality e | arXiv: 2604.21420
- faith factuality alignment through integrating trustworthiness and honestness | arXiv: 2604.10189
- faithful-first reasoning planning and acting for multimodal llms | arXiv: 2511.08409
- faithfulness vs safety evaluating llm behavior under counterfactual medical evid | arXiv: 2601.11886
- faithlens detecting and explaining faithfulness hallucination | arXiv: 2512.20182
- fastdiss few-step match many-step diffusion language model on sequence-to-sequen | arXiv: 2604.05551
- fastkv decoupling of context reduction and kv cache compression for prefill-deco | arXiv: 2502.01068
- fedgui benchmarking federated gui agents across heterogeneous platforms devices | arXiv: 2604.14956
- feedback adaptation for retrieval-augmented generation | arXiv: 2604.06647
- feedback-driven tool-use improvements in large language models via automated bui | arXiv: 2508.08791
- finch benchmarking finance amp accounting across spreadsheet-centric enterprise | arXiv: 2512.13168
- find your optimal teacher personalized data synthesis via router-guided multi-te | arXiv: 2510.10925
- finesteer a unified framework for fine-grained inference-time steering in large | arXiv: 2604.15488
- flare task-agnostic embedding model evaluation through a normalization process | arXiv: 2604.17344
- flexguard continuous risk scoring for strictness-adaptive llm content moderation | arXiv: 2602.23636
- follow the flow on information flow across textual tokens in text-to-image model | arXiv: 2504.01137
- foresight optimization for strategic reasoning in large language models | arXiv: 2604.13592
- forest before trees latent superposition for efficient visual reasoning | arXiv: 2601.06803
- forget what matters keep the rest selective unlearning of informative tokens | arXiv: 2604.17785
- frame of reference addressing the challenges of common ground representation in | arXiv: 2601.09365
- frankentext stitching random text fragments into long-form narratives | arXiv: 2505.18128
- fregelogic at semeval 2026 task 11 a hybrid neuro-symbolic architecture for cont | arXiv: 2604.18328
- from answers to arguments toward trustworthy clinical diagnostic reasoning with | arXiv: 2604.11137
- from charts to code a hierarchical benchmark for multimodal models | arXiv: 2510.17932
- from domains to instances dual-granularity data synthesis for llm unlearning | arXiv: 2601.04278
- from experience to skill multi-agent generative engine optimization via reusable | arXiv: 2604.19516
- from heads to neurons causal attribution and steering in multi-task vision-langu | arXiv: 2604.17941
- from if-statements to ml pipelines revisiting bias in code-generation | arXiv: 2604.21716
- from inheritance to saturation disentangling the evolution of visual redundancy | arXiv: 2604.16462
- from nodes to narratives explaining graph neural networks with llms and graph co | arXiv: 2508.07117
- from passive metric to active signal the evolving role of uncertainty quantifica | arXiv: 2601.15690
- from past to path masked history learning for next-item prediction in generative | arXiv: 2509.23649
- from query to counsel structured reasoning with a multi-agent framework and data | arXiv: 2604.10470
- from recall to forgetting benchmarking long-term memory for personalized agents | arXiv: 2604.20006
- from relevance to authority authority-aware generative retrieval in web search e | arXiv: 2604.13468
- from signal degradation to computation collapse uncovering the two failure modes | arXiv: 2604.19884
- from static inference to dynamic interaction a survey of streaming large languag | arXiv: 2603.04592
- from verbatim to gist distilling pyramidal multimodal memory via semantic inform | arXiv: 2603.01455
- from weights to activations is steering the next frontier of adaptation | arXiv: 2604.14090
- fs-researcher test-time scaling for long-horizon research tasks with file-system | arXiv: 2602.01566
- gambit a gamified jailbreak framework for multimodal large language models | arXiv: 2601.03416
- gameplayqa a benchmarking framework for decision-dense pov-synced multi-video un | arXiv: 2603.24329
- ganitllm difficulty-aware bengali mathematical reasoning through curriculum-grpo | arXiv: 2601.06767
- generalizable prompt tuning for audio-language models via semantic expansion | arXiv: 2601.20867
- generating attribution reports for manipulated facial images a dataset and basel | arXiv: 2412.19685
- generating effective cot traces for mitigating causal hallucination | arXiv: 2604.12748
- geora geometry-aware low-rank adaptation for rlvr | arXiv: 2601.09361
- georc a benchmark for geolocation reasoning chains | arXiv: 2601.21278
- gigacheck detecting llm-generated content via object-centric span localization | arXiv: 2410.23728
- graph-based alternatives to llms for human simulation | arXiv: 2511.02135
- grasprune global gating for budgeted structured pruning of large language models | arXiv: 2604.19398
- grass gradient-based adaptive layer-wise importance sampling for memory-efficien | arXiv: 2604.07808
- hag hierarchical demographic tree-based agent generation for topic-adaptive simu | arXiv: 2601.05656
- halluaudio a comprehensive benchmark for hallucination detection in large audio- | arXiv: 2604.19300
- hard to be heard phoneme-level asr analysis of phonologically complex low-resour | arXiv: 2604.18204
- harpo hierarchical agentic reasoning for user-aligned conversational recommendat | arXiv: 2604.10048
- hcfd a benchmark for audio deepfake detection in healthcare | arXiv: 2604.17642
- hcre llm-based hierarchical classification for cross-document relation extractio | arXiv: 2604.07937
- healing entropy collapse enhancing exploration in few-shot rlvr via hybrid-domai | arXiv: 2604.17928
- hela-mem hebbian learning and associative memory for llm agents | arXiv: 2604.16839
- hermes kv cache as hierarchical memory for efficient streaming video understandi | arXiv: 2601.14724
- heterocache a dynamic retrieval approach to heterogeneous kv cache compression f | arXiv: 2601.13684
- hierarchical policy optimization for simultaneous translation of unbounded speec | arXiv: 2604.21045
- hierarchical reinforcement learning with augmented step-level transitions for ll | arXiv: 2604.05808
- higmem a hierarchical and llm-guided memory system for long-term conversational | arXiv: 2604.18349
- hiprune hierarchical attention for efficient token pruning in vision-language mo | arXiv: 2508.00553
- histlens mapping idea change across concepts and corpora | arXiv: 2604.11749
- horizon a benchmark for in-the-wild user behaviour modeling | arXiv: 2604.17259
- how adversarial environments mislead agentic ai | arXiv: 2604.18874
- how do answer tokens read reasoning traces self-reading patterns in thinking llm | arXiv: 2604.19149
- how hypocritical is your llm judge listener-speaker asymmetries in the pragmatic | arXiv: 2604.15873
- how language models conflate logical validity with plausibility a representation | arXiv: 2510.06700
- how retrieved context shapes internal representations in rag | arXiv: 2602.20091
- how should we enhance the safety of large reasoning models an empirical study | arXiv: 2505.15404
- humanllm benchmarking and improving llm anthropomorphism via human cognitive pat | arXiv: 2601.10198
- hybrid-vector retrieval for visually rich documents combining single-vector effi | arXiv: 2510.22215
- hypehr hyperbolic modeling of electronic health records for efficient question a | arXiv: 2604.21027
- icebreaker for conversational agents breaking the first-message barrier with per | arXiv: 2604.18375
- idea an interpretable and editable decision-making framework for llms via verbal | arXiv: 2604.12573
- idiom understanding as a tool to measure the dialect gap | arXiv: 2510.05026
- impact importance-aware activation space reconstruction | arXiv: 2507.03828
- imperfectly cooperative human-ai interactions comparing the impacts of human and | arXiv: 2604.15607
- implicitmembench measuring unconscious behavioral adaptation in large language m | arXiv: 2604.08064
- imprif stronger implicit reasoning leads to better complex instruction following | arXiv: 2602.21228
- improving the throughput of diffusion-based large language models via a training | arXiv: 2512.07173
- indic-codecfake meets satyam towards detecting neural audio codec synthesized sp | arXiv: 2604.19949
- indotabvqa a benchmark for cross-lingual table understanding in bahasa indonesia | arXiv: 2604.11970
- inflated excellence or true performance rethinking medical diagnostic benchmarks | arXiv: 2510.09275
- interpretability from the ground up stakeholder-centric design of automated scor | arXiv: 2511.17069
- interpretable traces unexpected outcomes investigating the disconnect in trace-b | arXiv: 2505.13792
- into the gray zone domain contexts can blur llm safety boundaries | arXiv: 2604.15717
- investigating counterfactual unfairness in llms towards identities through humor | arXiv: 2604.18729
- is agentic rag worth it an experimental comparison of rag approaches | arXiv: 2601.07711
- is this chart lying to me automating the detection of misleading visualizations | arXiv: 2508.21675
- it39s high time a survey of temporal question answering | arXiv: 2505.20243
- itag inverse design for natural text generation with accurate causal graph annot | arXiv: 2604.06902
- iterative formalization and planning in partially observable environments | arXiv: 2505.13126
- jailbreaking large language models with morality attacks | arXiv: 2604.17053
- jamendo-mt-qa a benchmark for multi-track comparative music question answering | arXiv: 2604.09721
- jtpro a joint tool-prompt reflective optimization framework for language agents | arXiv: 2604.19821
- judgemenot personalizing large language models to emulate judicial reasoning in | arXiv: 2604.18041
- just use xml revisiting joint translation and label projection | arXiv: 2603.12021
- know thy enemy securing llms against prompt injection via diverse data synthesis | arXiv: 2601.04666
- koco conditioning language model pre-training on knowledge coordinates | arXiv: 2604.12397
- koco-bench can large language models leverage domain knowledge in software devel | arXiv: 2601.13240
- lami augmenting large language models via late multi-image fusion | arXiv: 2406.13621
- language model as planner and formalizer under constraints | arXiv: 2510.05486
- language models entangle language and culture | arXiv: 2601.15337
- language on demand knowledge at core composing llms with encoder-decoder transla | arXiv: 2603.17512
- language reconstruction with brain predictive coding from fmri data | arXiv: 2405.11597
- language-coupled reinforcement learning for multilingual retrieval-augmented gen | arXiv: 2601.14896
- large language models are bad dice players llms struggle to generate random numb | arXiv: 2601.05414
- large reasoning models are not yet multilingual latent reasoners | arXiv: 2601.02996
- latent-condensed transformer for efficient long context modeling | arXiv: 2604.12452
- learning dynamic representations and policies from multimodal clinical time-seri | arXiv: 2604.21235
- learning invariant modality representation for robust multimodal learning from a | arXiv: 2604.18460
- learning to edit knowledge via instruction-based chain-of-thought prompting | arXiv: 2604.05540
- learning to extract rational evidence via reinforcement learning for retrieval-a | arXiv: 2507.15586
- learning to retrieve user history and generate user profiles for personalized pe | arXiv: 2601.05654
- learning uncertainty from sequential internal dispersion in large language model | arXiv: 2604.15741
- leave my images alone preventing multi-modal large language models from analyzin | arXiv: 2604.09024
- leprec reasoning as classification over structured factors for assessing relevan | arXiv: 2604.19464
- less noise more voice reinforcement learning for reasoning via instruction purif | arXiv: 2601.21244
- lexrel benchmarking legal relation extraction for chinese civil cases | arXiv: 2512.12643
- lightweight llm agent memory with small language models | arXiv: 2604.07798
- llm prompt duel optimizer efficient label-free prompt optimization | arXiv: 2510.13907
- llm-guided semantic bootstrapping for interpretable text classification with tse | arXiv: 2604.12223
- llms underperform graph-based parsers on supervised relation extraction for comp | arXiv: 2604.08752
- location not found exposing implicit local and global biases in multilingual llm | arXiv: 2604.19292
- logical phase transitions understanding collapse in llm logical reasoning | arXiv: 2601.02902
- logiceval a systematic framework for evaluating automated repair techniques for | arXiv: 2604.12994
- logoskg hardware-optimized scalable and interpretable knowledge graph retrieval | arXiv: 2604.18913
- look twice before you leap a rational framework for localized adversarial anonym | arXiv: 2512.06713
- lora on the go instance-level dynamic lora selection and merging | arXiv: 2511.07129
- losses that cook topological optimal transport for structured recipe generation | arXiv: 2601.02531
- lost in diffusion uncovering hallucination patterns and failure modes in diffusi | arXiv: 2604.10556
- lost in the prompt order revealing the limitations of causal attention in langua | arXiv: 2601.14152
- lost in translation do lvlm judges generalize across languages | arXiv: 2604.19405
- lpo towards accurate gui agent interaction via location preference optimization | arXiv: 2506.09373
- lqm linguistically motivated multidimensional quality metrics for machine transl | arXiv: 2604.18490
- mab-dqa addressing query aspect importance in document question answering with m | arXiv: 2604.08952
- made a living benchmark for multi-label text classification with uncertainty qua | arXiv: 2604.15203
- maestro meta-learning adaptive estimation of scalarization trade-offs for reward | arXiv: 2601.07208
- making mllms blind adversarial smuggling attacks in mllm content moderation | arXiv: 2604.06950
- march evaluating the intersection of ambiguity interpretation and multi-hop infe | arXiv: 2509.22750
- march multi-agent radiology clinical hierarchy for ct report generation | arXiv: 2604.16175
- mars2 scaling multi-agent tree search via reinforcement learning for code genera | arXiv: 2604.14564
- mash evading black-box ai-generated text detectors via style humanization | arXiv: 2601.08564
- masked by consensus disentangling privileged knowledge in llm correctness | arXiv: 2604.12373
- mass-rag multi-agent synthesis retrieval-augmented generation | arXiv: 2604.18509
- mata multi-agent framework for reliable and flexible table question answering | arXiv: 2602.09642
- mathagent adversarial evolution of constraint graphs for mathematical reasoning | arXiv: 2604.11188
- mathflow enhancing the perceptual flow of mllms for visual mathematical problems | arXiv: 2503.16549
- maximizing local entropy where it matters prefix-aware localized llm unlearning | arXiv: 2601.03190
- mcga a multi-task classical chinese literary genre audio corpus | arXiv: 2601.09270
- mcp-flow facilitating llm agents to master real-world diverse and scaling mcp to | arXiv: 2510.24284
- meashalu mitigation of scientific measurement hallucinations for large language | arXiv: 2604.16929
- measuring what matters assessing therapeutic principles in mental-health convers | arXiv: 2604.05795
- medlaybench-v a large-scale benchmark for expert-lay semantic alignment in medic | arXiv: 2604.05738
- mem2evolve towards self-evolving agents via co-evolutionary capability expansion | arXiv: 2604.10923
- memophishagent memory-augmented multi-modal llm agent for phishing url detection | arXiv: 2602.21394
- memory-augmented llm-based multi-agent system for automated feature generation o | arXiv: 2604.20261
- memp exploring agent procedural memory | arXiv: 2508.06433
- meta-tool efficient few-shot tool adaptation for small language models | arXiv: 2604.20148
- mhsafeeval role-aware interaction-level evaluation of mental health safety in la | arXiv: 2604.17730
- min-k sampling decoupling truncation from temperature scaling via relative logit | arXiv: 2604.11012
- mina a multilingual llm-powered legal assistant agent for bangladesh for empower | arXiv: 2511.08605
- mitigating catastrophic forgetting in target language adaptation of llms via sou | arXiv: 2512.04844
- mitigating extrinsic gender bias for bangla classification tasks | arXiv: 2411.10636
- mitigating hallucinations in large vision-language models without performance de | arXiv: 2604.20366
- mmerror a benchmark for erroneous reasoning in vision-language models | arXiv: 2601.03331
- model internal sleuthing finding lexical identity and inflectional features in m | arXiv: 2506.02132
- model-agnostic meta learning for class imbalance adaptation | arXiv: 2604.18759
- modeling multi-dimensional cognitive states in large language models under cogni | arXiv: 2604.17174
- moneta multimodal industry classification through geographic information with mu | arXiv: 2604.07956
- more than meets the eye measuring the semiotic gap in vision-language models via | arXiv: 2604.17354
- morphogen a multilingual benchmark for evaluating gender-aware morphological gen | arXiv: 2604.18914
- mtr-duplexbench towards a comprehensive evaluation of multi-round conversations | arXiv: 2511.10262
- muldimif a multi-dimensional constraint framework for evaluating and improving i | arXiv: 2505.07591
- multi-drafter speculative decoding with alignment feedback | arXiv: 2604.05417
- multi-faceted self-consistent preference alignment for query rewriting in conver | arXiv: 2604.06771
- multi-task reinforcement learning for enhanced multimodal llm-as-a-judge | arXiv: 2603.11665
- multi-view attention multiple-instance learning enhanced by llm reasoning for co | arXiv: 2509.17292
- multifiletest a multi-file-level llm unit test generation benchmark and impact o | arXiv: 2502.06556
- multilingual language models encode script over linguistic structure | arXiv: 2604.05090
- multimodal in-context learning for asr of low-resource languages | arXiv: 2601.05707
- music audio-visual question answering requires specialized multimodal designs | arXiv: 2505.20638
- musical score understanding benchmark evaluating large language models39 compreh | arXiv: 2511.20697
- native hybrid attention for efficient sequence modeling | arXiv: 2510.07019
- no one fits all from fixed prompting to learned routing in multilingual llms | arXiv: 2604.16937
- no-worse context-aware decoding preventing neutral regression in context-conditi | arXiv: 2604.16686
- nose neural olfactory-semantic embedding with tri-modal orthogonal contrastive l | arXiv: 2604.10452
- not all animals are equal metaphorical framing through source domains and semant | arXiv: 2604.20454
- octotools an agentic framework with extensible tools for complex reasoning | arXiv: 2502.11271
- odutqa-mdc a task for open-domain underspecified tabular qa with multi-turn dial | arXiv: 2604.10159
- omibench benchmarking olympiad-level multi-image reasoning in large vision-langu | arXiv: 2604.20806
- omni-embed-audio leveraging multimodal llms for robust audio-text retrieval | arXiv: 2604.18360
- omnicompliance-100k a multi-domain rule-grounded real-world safety compliance da | arXiv: 2603.13933
- omnidiagram advancing unified diagram code generation via visual interrogation r | arXiv: 2604.05514
- on safety risks in experience-driven self-evolving agents | arXiv: 2604.16968
- on the step length confounding in llm reasoning data selection | arXiv: 2604.06834
- one persona many cues different results how sociodemographic cues impact llm per | arXiv: 2601.18572
- optimizing user profiles via contextual bandits for retrieval-augmented llm pers | arXiv: 2601.12078
- oscbench benchmarking object state change in text-to-video generation | arXiv: 2603.11698
- parallel test-time scaling for latent reasoning models | arXiv: 2510.07745
- parallel universes parallel languages a comprehensive study on llm-based multili | arXiv: 2601.00263
- persona-e2 a human-grounded dataset for personality-shaped emotional responses t | arXiv: 2604.09162
- personalized benchmarking evaluating llms by individual preferences | arXiv: 2604.18943
- piarena a platform for prompt injection evaluation | arXiv: 2604.08499
- planning beyond text graph-based reasoning for complex narrative generation | arXiv: 2604.21253
- please refuse to answer me mitigating over-refusal in large language models via | arXiv: 2604.17132
- policyllm towards excellent comprehension of public policy for large language mo | arXiv: 2604.12995
- polynomial expansion rank adaptation enhancing low-rank fine-tuning with high-or | arXiv: 2604.11841
- position multimodal large language models can significantly advance scientific r | arXiv: 2502.02871
- precise debugging benchmark is your model debugging or regenerating | arXiv: 2604.17338
- preference estimation via opponent modeling in multi-agent negotiation | arXiv: 2604.15687
- prefix parsing is just parsing | arXiv: 2604.21191
- principlismqa a philosophy-grounded approach to assessing llm-human clinical med | arXiv: 2508.05132
- probing for reading times | arXiv: 2604.18712
- process reward models meet planning generating precise and scalable datasets for | arXiv: 2604.17957
- prosody as supervision bridging the non-verbal--verbal for multilingual speech e | arXiv: 2604.17647
- protecting bystander privacy via selective hearing in audio llms | arXiv: 2512.06380
- protocycle reflective tool-augmented planning for text-guided protein design | arXiv: 2604.16896
- pseudo2real task arithmetic for pseudo-label correction in automatic speech reco | arXiv: 2510.08047
- purging the gray zone latent-geometric denoising for precise knowledge boundary | arXiv: 2604.14324
- pv-sql synergizing database probing and rule-based verification for text-to-sql | arXiv: 2604.17653
- qimeng-prepair precise code repair via edit-aware reward optimization | arXiv: 2604.05963
- quality over clicks intrinsic quality-driven iterative reinforcement learning fo | arXiv: 2603.22922
- query pipeline optimization for cancer patient question answering systems | arXiv: 2412.14751
- ra-rrg multimodal retrieval-augmented radiology report generation with key phras | arXiv: 2504.07415
- racer retrieval-augmented contextual rapid speculative decoding | arXiv: 2604.14885
- rads reinforcement learning-based sample selection improves transfer learning in | arXiv: 2604.20256
- rare redundancy-aware retrieval evaluation framework for high-similarity corpora | arXiv: 2604.19047
- reason only when needed efficient generative reward modeling via model-internal | arXiv: 2604.10072
- reasonembed enhanced text embeddings for reasoning-intensive document retrieval | arXiv: 2510.08252
- reasoning fails where step flow breaks | arXiv: 2604.06695
- reasoning hijacking the fragility of reasoning alignment in large language model | arXiv: 2601.10294
- reasoning-based refinement of unsupervised text clusters with llms | arXiv: 2604.07562
- recoqa a benchmark for tool-augmented and multi-step reasoning in real estate qu | arXiv: 2604.17944
- referee reference-free and fine-grained method for evaluating factual consistenc | arXiv: 2604.10520
- region-grounded report generation for 3d medical imaging a fine-grained dataset | arXiv: 2604.18145
- region-r1 reinforcing query-side region cropping for multi-modal re-ranking | arXiv: 2604.05268
- reinforced efficient reasoning via semantically diverse exploration | arXiv: 2601.05053
- reliable evaluation protocol for low-precision retrieval | arXiv: 2508.03306
- render-of-thought rendering textual chain-of-thought as images for visual latent | arXiv: 2601.14750
- reposhapley shapley-enhanced context filtering for repository-level code complet | arXiv: 2601.03378
- representation-guided parameter-efficient llm unlearning | arXiv: 2604.17396
- reprompt recurrent prompt tuning for integrating structured ehr encoders with la | arXiv: 2604.17725
- rerec reasoning-augmented llm-based recommendation assistant via reinforcement f | arXiv: 2604.07851
- researchbench benchmarking llms in scientific discovery via inspiration-based ta | arXiv: 2503.21248
- rethinking jailbreak detection of large vision language models with representati | arXiv: 2512.12069
- rethinking llm watermark detection in black-box settings a non-intrusive third-p | arXiv: 2603.14968
- rethinking meeting effectiveness a benchmark and framework for temporal fine-gra | arXiv: 2604.17260
- retraceqa evaluating reasoning traces of small language models in commonsense qu | arXiv: 2510.09351
- retrievals can be detrimental unveiling the backdoor vulnerability of retrieval- | arXiv: 2501.13340
- retrieving to recover towards incomplete audio-visual question answering via sem | arXiv: 2604.10695
- reverse constitutional ai a framework for controllable toxic data generation via | arXiv: 2604.17769
- revisiting entropy in reinforcement learning for large reasoning models | arXiv: 2511.05993
- revisiting non-verbatim memorization in large language models the role of entity | arXiv: 2604.21882
- revisiting the uniform information density hypothesis in llm reasoning | arXiv: 2510.06953
- revitalizing black-box interpretability actionable interpretability for llms via | arXiv: 2505.12509
- reward modeling for scientific writing evaluation | arXiv: 2601.11374
- rhetorical questions in llm representations a linear probing study | arXiv: 2604.14128
- right at my level a unified multilingual framework for proficiency-aware text si | arXiv: 2604.05302
- risk a framework for gui agents in e-commerce risk management | arXiv: 2509.21982
- ritek a dataset for large language models complex reasoning over textual knowled | arXiv: 2410.13987
- river-llm large language model seamless exit based on kv share | arXiv: 2604.18396
- rl-plus countering capability boundary collapse of llms in reinforcement learnin | arXiv: 2508.00222
- robust tool use via fission-grpo learning to recover from execution errors | arXiv: 2601.15625
- robustness via referencing defending against prompt injection attacks by referen | arXiv: 2504.20472
- roleconflictbench a benchmark of role conflict scenarios for evaluating llms39 c | arXiv: 2509.25897
- route to rome attack directing llm routers to expensive models via adversarial s | arXiv: 2604.15022
- s2h-dpo hardness-aware preference optimization for vision-language models | arXiv: 2604.18512
- saber an efficient sampling with adaptive acceleration and backtracking enhanced | arXiv: 2510.18165
- safemerge preserving safety alignment in fine-tuned large language models via se | arXiv: 2503.17239
- safetyalfred evaluating safety-conscious planning of multimodal large language m | arXiv: 2604.19638
- sage sign-adaptive gradient for memory-efficient llm optimization | arXiv: 2604.07663
- samora semantic-aware mixture of lora experts for task-adaptive learning | arXiv: 2604.19048
- savoir learning social savoir-faire via shapley-based reward attribution | arXiv: 2604.18982
- scaling behaviors of llm reinforcement learning post-training an empirical study | arXiv: 2509.25300
- scaling external knowledge input beyond context windows of llms via multi-agent | arXiv: 2505.21471
- scaling test-time compute to achieve ioi gold medal with open-weight models | arXiv: 2510.14232
- scicoqa quality assurance for scientific paper--code alignment | arXiv: 2601.12910
- sciimpact a multi-dimensional multi-field benchmark for scientific impact predic | arXiv: 2604.17141
- script a subcharacter compositional representation injection module for korean p | arXiv: 2604.12377
- scripts through time a survey of the evolving role of transliteration in nlp | arXiv: 2604.18722
- sculpting the vector space towards efficient multi-vector visual document retrie | arXiv: 2602.19549
- scurank ranking multiple candidate summaries with summary content units for enha | arXiv: 2604.19185
- securevibebench evaluating secure coding capabilities of code agents with realis | arXiv: 2509.22097
- seeing no evil blinding large vision-language models to safety instructions via | arXiv: 2604.10299
- selar selective latent reasoning in large language models | arXiv: 2604.08299
- self-awareness before action mitigating logical inertia via proactive cognitive | arXiv: 2604.20413
- self-consistency from only two samples cot-pot ensembling for efficient llm reas | arXiv: 2604.17433
- self-correcting text-to-video generation with misalignment detection and localiz | arXiv: 2411.15115
- self-reinforcing controllable synthesis of rare relational data via bayesian cal | arXiv: 2604.16817
- semantic-aware logical reasoning via a semiotic framework | arXiv: 2509.24765
- semantic-space exploration and exploitation in rlvr for llm reasoning | arXiv: 2509.23808
- semi-supervised diseased detection from speech dialogues with multi-level data m | arXiv: 2601.04744
- sense and sensitivity examining the influence of semantic recall on long context | arXiv: 2505.13353
- serm self-evolving relevance model with agent-driven learning from massive query | arXiv: 2601.09515
- sessionintentbench a multi-task inter-session intention-shift modeling benchmark | arXiv: 2507.20185
- sftmix elevating language model instruction tuning with mixup recipe | arXiv: 2410.05248
- silo-bench a scalable environment for evaluating distributed coordination in mul | arXiv: 2603.01045
- similarity-distance-magnitude activations | arXiv: 2509.12760
- slideagent hierarchical agentic framework for multi-page visual document underst | arXiv: 2510.26615
- socia-evo automated simulator construction via dual-anchored bi-level optimizati | arXiv: 2604.17351
- soft head selection for injecting icl-derived task embeddings | arXiv: 2507.20906
- solidcoder bridging the mental-reality gap in llm code generation through concre | arXiv: 2604.19825
- solver-independent automated problem formulation via llms for high-cost simulati | arXiv: 2512.18682
- spagbias uncovering and tracing structured spatial gender bias in large language | arXiv: 2604.14672
- spasm stable persona-driven agent simulation for multi-turn dialogue generation | arXiv: 2604.09212
- speakersleuth can large audio-language models judge speaker consistency across m | arXiv: 2601.04029
- spec-o3 a tool-augmented vision-language agent for rare celestial object candida | arXiv: 2601.06498
- specbound adaptive bounded self-speculation with layer-wise confidence calibrati | arXiv: 2604.12247
- speculative verification exploiting information gain to refine speculative decod | arXiv: 2509.24328
- spence a syntactic probe for detecting contamination in nl2sql benchmarks | arXiv: 2604.17771
- spiralthinker latent reasoning through an iterative process with text-latent int | arXiv: 2511.08983
- splits flexible sociocultural linguistic investigation at scale | arXiv: 2504.04640
- spotlight and shadow attention-guided dual-anchor introspective decoding for mll | arXiv: 2604.10071
- stable on-policy distillation through adaptive target reformulation | arXiv: 2601.07155
- stable-rag mitigating retrieval-permutation-induced hallucinations in retrieval- | arXiv: 2601.02993
- star-teaming a strategy-response multiplex network approach to automated llm red | arXiv: 2604.18976
- step-grpo internalizing dynamic early exit for efficient reasoning | arXiv: 2604.16890
- still between us evaluating and improving voice assistant robustness to third-pa | arXiv: 2604.17358
- stk-adapter incorporating evolving graph and event chain for temporal knowledge | arXiv: 2604.19042
- storycoder narrative reformulation for structured reasoning in llm code generati | arXiv: 2604.14631
- stresstest can your speech lm handle the stress | arXiv: 2505.22765
- stride-ed a strategy-grounded stepwise reasoning framework for empathetic dialog | arXiv: 2604.07100
- structkv preserving the structural skeleton for scalable long-context inference | arXiv: 2604.06746
- structmem structured memory for long-horizon behavior in llms | arXiv: 2604.21748
- style amnesia investigating speaking style degradation and mitigation in multi-t | arXiv: 2512.23578
- style over story measuring llm narrative preferences via structured selection | arXiv: 2510.02025
- subject-level inference for realistic text anonymization evaluation | arXiv: 2604.21211
- supplement generation training for enhancing agentic task performance | arXiv: 2604.20727
- syntax as a rosetta stone universal dependencies for in-context coptic translati | arXiv: 2604.18758
- synthagent adapting web agents with synthetic supervision | arXiv: 2511.06101
- synthia scalable grounded persona generation from social media data | arXiv: 2507.14922
- table question answering in the era of large language models a comprehensive sur | arXiv: 2510.09671
- tabrex tabular referenceless explainable evaluation | arXiv: 2512.15907
- taming actor-observer asymmetry in agents via dialectical alignment | arXiv: 2604.19548
- targeted exploration via unified entropy control for reinforcement learning | arXiv: 2604.14646
- task-aware llm routing with multi-level task-profile-guided data synthesis for c | arXiv: 2604.09377
- task-stratified knowledge scaling laws for post-training quantized large languag | arXiv: 2508.18609
- taxpraben a scalable benchmark for structured evaluation of llms in chinese real | arXiv: 2604.08948
- tellwhisper tell whisper who speaks when | arXiv: 2601.03712
- tema anchor the image follow the text for multi-modification composed image retr | arXiv: 2604.21806
- template-assisted contrastive learning of task-oriented dialogue sentence embedd | arXiv: 2305.14299
- temporal contrastive decoding a training-free method for large audio-language mo | arXiv: 2604.15383
- temporal flattening in llm-generated text comparing human and llm writing trajec | arXiv: 2604.12097
- temporal leakage in search-engine date-filtered web retrieval a retrospective fo | arXiv: 2602.00758
- temporalvlm video llms for temporal reasoning in long videos | arXiv: 2412.02930
- text-attributed knowledge graph enrichment with large language models for medica | arXiv: 2604.13331
- text-to-distribution prediction with quantile tokens and neighbor context | arXiv: 2604.20216
- the gaoyao benchmark a comprehensive framework for evaluating multilingual and m | arXiv: 2604.20225
- the model agreed but didn39t learn diagnosing surface compliance in large langua | arXiv: 2604.05995
- the path not taken duality in reasoning about program execution | arXiv: 2604.20917
- the reasoning trap how enhancing llm reasoning amplifies tool hallucination | arXiv: 2510.22977
- the stackelberg speaker optimizing persuasive communication in social deduction | arXiv: 2510.09087
- think in latent thoughts a new paradigm for gloss-free sign language translation | arXiv: 2604.15301
- think in sentences explicit sentence boundaries enhance language model39s capabi | arXiv: 2604.10135
- think outside the policy in-context steered policy optimization | arXiv: 2510.26519
- thinking like a botanist challenging multimodal language models with intent-driv | arXiv: 2604.20983
- threadsumm summarization of nested discourse threads using tree of thoughts | arXiv: 2604.17648
- through the magnifying glass adaptive perception magnification for hallucination | arXiv: 2503.10183
- time-ra towards time series reasoning for anomaly diagnosis with llm feedback | arXiv: 2507.15066
- tingis real-time risk event discovery from noisy customer incidents at enterpris | arXiv: 2604.21889
- to lie or not to lie investigating the biased spread of global lies by llms | arXiv: 2604.06552
- to trust or not to trust attention-based trust management for llm multi-agent sy | arXiv: 2506.02546
- toolomni enabling open-world tool use via agentic learning with proactive retrie | arXiv: 2604.13787
- topic-based watermarks for large language models | arXiv: 2404.02138
- topology-aware layer pruning for large vision-language models | arXiv: 2604.16502
- toward consistent world models with multi-token prediction and latent semantic e | arXiv: 2604.06155
- towards bridging the reward-generation gap in direct alignment algorithms | arXiv: 2506.09457
- towards effective in-context cross-domain knowledge transfer via domain-invarian | arXiv: 2604.05383
- towards fine-grained and multi-granular contrastive language-speech pre-training | arXiv: 2601.03065
- towards intrinsic interpretability of large language modelsa survey of design pr | arXiv: 2604.16042
- towards proactive information probing customer service chatbots harvesting value | arXiv: 2604.11077
- towards robust real-world spreadsheet understanding with multi-agent multi-forma | arXiv: 2604.12282
- towards scalable lightweight gui agents via multi-role orchestration | arXiv: 2604.13488
- towards self-improving error diagnosis in multi-agent systems | arXiv: 2604.17658
- toxitrace gradient-aligned training for explainable chinese toxicity detection | arXiv: 2604.12321
- toxreason a benchmark for mechanistic chemical toxicity reasoning via adverse ou | arXiv: 2604.06264
- tpa next token probability attribution for detecting hallucinations in rag | arXiv: 2512.07515
- tracing relational knowledge recall in large language models | arXiv: 2604.19934
- training-free test-time contrastive learning for large language models | arXiv: 2604.13552
- trajguard streaming hidden-state trajectory detection for decoding-time jailbrea | arXiv: 2604.07727
- tree-of-evidence efficient 34system 234 search for faithful multimodal grounding | arXiv: 2604.07692
- trigreason trigger-based collaboration between small and large reasoning models | arXiv: 2604.14847
- trojail trajectory-level optimization for multi-turn large language model jailbr | arXiv: 2512.07761
- two pathways to truthfulness on the intrinsic encoding of llm hallucinations | arXiv: 2601.07422
- ucs estimating unseen coverage for improved in-context learning | arXiv: 2604.12015
- ukp psycontrol at semeval-2026 task 2 modeling valence and arousal dynamics from | arXiv: 2604.21534
- uncertainty quantification in llm agents foundations emerging challenges and opp | arXiv: 2602.05073
- understanding and mitigating spurious signal amplification in test-time reinforc | arXiv: 2604.21327
- understanding generalization in role-playing models via information theory | arXiv: 2512.17270
- understanding new-knowledge-induced factual hallucinations in llms analysis and | arXiv: 2511.02626
- understanding or memorizing a case study of german definite articles in language | arXiv: 2601.09313
- understanding structured financial data with llms a case study on fraud detectio | arXiv: 2512.13040
- unicreative unifying long-form logic and short-form sparkle via reference-free r | arXiv: 2604.05517
- unleashing spatial reasoning in multimodal large language models via textual rep | arXiv: 2603.23404
- unlocking the edge deployment and ondevice acceleration of multi-lora enabled on | arXiv: 2604.18655
- vc-inspector advancing reference-free evaluation of video captions with factual | arXiv: 2509.16538
- videostir understanding long videos via spatio-temporally structured and intent- | arXiv: 2604.05418
- vill-e video llm embeddings for retrieval | arXiv: 2604.12148
- visret visualization improves knowledge-intensive text-to-image retrieval | arXiv: 2505.20291
- vista verification in sequential turn-based assessment | arXiv: 2510.27052
- vla-forget vision-language-action unlearning for embodied foundation models | arXiv: 2604.03956
- vln-nf feasibility-aware vision-and-language navigation with false-premise instr | arXiv: 2604.10533
- vocab diet reshaping the vocabulary of llms via vector arithmetic | arXiv: 2510.17001
- voxmind an end-to-end agentic spoken dialogue system | arXiv: 2604.15710
- waking up blind cold-start optimization of supervision-free agentic trajectories | arXiv: 2604.17475
- what do vision-language models encode for personalized image aesthetics assessme | arXiv: 2604.11374
- what factors affect llms and rllms in financial question answering | arXiv: 2507.08339
- what if consensus lies selective-complementary reinforcement learning at test ti | arXiv: 2603.19880
- what makes an ideal quote recommending 34unexpected yet rational34 quotations vi | arXiv: 2602.22220
- what makes an llm a good optimizer a trajectory analysis of llm-guided evolution | arXiv: 2604.19440
- what makes llms effective sequential recommenders a study on preference intensit | arXiv: 2506.02261
- what39s missing in screen-to-action towards a ui-in-the-loop paradigm for multim | arXiv: 2604.06995
- when agents look the same quantifying distillation-induced similarity in tool-us | arXiv: 2604.21255
- when bigger isn39t better a comprehensive fairness evaluation of political bias | arXiv: 2604.21309
- when helpers become hazards a benchmark for analyzing multimodal llm-powered saf | arXiv: 2601.04043
- when is thinking enough early exit via sufficiency assessment for efficient reas | arXiv: 2604.06787
- when misinformation speaks and converses rethinking fact-checking in audio platf | arXiv: 2604.16767
- when personalization tricks detectors the feature-inversion trap in machine-gene | arXiv: 2510.12476
- when slower isn39t truer inverse scaling law of truthfulness in multimodal reaso | arXiv: 2505.20214
- when vision-language models judge without seeing exposing informativeness bias | arXiv: 2604.17768
- where and what reasoning dynamic and implicit preferences in situated conversati | arXiv: 2604.20749
- which bird does not have wings negative-constrained kgqa with schema-guided sema | arXiv: 2604.14749
- which reasoning trajectories teach students to reason better a simple metric of | arXiv: 2601.14249
- who gets which message auditing demographic bias in llm-generated targeted text | arXiv: 2601.17172
- who wrote this line evaluating the detection of llm-generated classical chinese | arXiv: 2604.10101
- why agents compromise safety under pressure | arXiv: 2603.14975
- why did apple fall evaluating curiosity in large language models | arXiv: 2510.20635
- why do multilingual reasoning gaps emerge in reasoning language models | arXiv: 2510.27269
- why supervised fine-tuning fails to learn a systematic study of incomplete learn | arXiv: 2604.10079
- why these documents explainable generative retrieval with hierarchical category | arXiv: 2411.05572
- wikiseeker rethinking the role of vision-language models in knowledge-based visu | arXiv: 2604.05818
- wisca a lightweight model transition method to improve llm training via weight s | arXiv: 2508.16676
- working memory constraints scaffold learning in transformers under data scarcity | arXiv: 2604.20789
- xlsr-mambo scaling the hybrid mamba-attention backbone for audio deepfake detect | arXiv: 2601.02944
- xmark reliable multi-bit watermarking for llm-generated texts | arXiv: 2604.05242
- xoxo stealthy cross-origin context poisoning attacks against ai coding assistant | arXiv: 2503.14281
- xq-meval a dataset with cross-lingual parallel quality for benchmarking translat | arXiv: 2604.14934
- xtragpt context-aware and controllable academic paper revision via human-ai coll | arXiv: 2505.11336
- yield a large-scale dataset and evaluation framework for information elicitation | arXiv: 2604.10968
- your llm agents are temporally blind the misalignment between tool use decisions | arXiv: 2510.23853
- zara training-free motion time-series reasoning via evidence-grounded llm agents | arXiv: 2508.04038
- zipvoice-dialog non-autoregressive spoken dialogue generation with flow matching | arXiv: 2507.09318