ACL2025 论文笔记 TODO¶
总计: 2835 篇 | 已完成: 2358 | 待更新: 477
- "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | arXiv: 2411.02355
- 500xcompressor generalized prompt compression for large language models | arXiv: 2408.03094
- a case study of cross-lingual zero-shot generalization for classical languages i | arXiv: 2505.13173
- a comprehensive graph framework for question answering with mode-seeking prefere | arXiv: 2506.17951
- a conformal risk control framework for granular word assessment and uncertainty | arXiv: 2504.01225
- a drop-in solution for on-the-fly adaptation of speculative decoding in large la
- a dual-mind framework for strategic and expressive negotiation agent
- a dual-perspective nlg meta-evaluation framework with automatic benchmark and be | arXiv: 2502.12052
- a general knowledge injection framework for icd coding | arXiv: 2505.18708
- a generative adaptive replay continual learning model for temporal knowledge gra
- a large and balanced corpus for fine-grained arabic readability assessment | arXiv: 2502.13520
- a large-scale real-world evaluation of llm-based virtual teaching assistant | arXiv: 2506.17363
- a little human data goes a long way | arXiv: 2410.13098
- a measure of the system dependence of automated metrics | arXiv: 2412.03152
- a mismatched benchmark for scientific natural language inference | arXiv: 2506.04603
- a modular approach for clinical slms driven by synthetic data with pre-instructi
- a modular dataset to demonstrate llm abstraction capability | arXiv: 2503.17645
- a multi-agent framework for mitigating dialect biases in privacy policy question | arXiv: 2506.02998
- A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems | arXiv: 2506.02998
- a multi-persona framework for argument quality assessment
- a mutual information perspective on knowledge graph embedding
- a new formulation of zipfs meaning-frequency law through contextual diversity
- a parameter-efficient and fine-grained prompt learning for vision-language model
- a practical approach for building production-grade conversational agents with wo | arXiv: 2505.23006
- a reality check on context utilisation for retrieval-augmented generation | arXiv: 2412.17031
- a representation level analysis of nmt model robustness to grammatical errors | arXiv: 2505.21224
- a retrieval-based approach to medical procedure matching in romanian | arXiv: 2503.20556
- a rose by any other name llm-generated explanations are good proxies for human e | arXiv: 2412.13942
- a self-denoising model for robust few-shot relation extraction
- a semantic-aware layer-freezing approach to computation-efficient fine-tuning of | arXiv: 2406.11753
- a semi-supervised scalable unified framework for e-commerce query classification | arXiv: 2506.21049
- A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression | arXiv: 2412.17483
- a spatio-temporal point process for fine-grained modeling of reading behavior | arXiv: 2506.19999
- a statistical and multi-perspective revisiting of the membership inference attac
- a strategic coordination framework of small lms matches large lms in data synthe
- a survey of automatic prompt optimization with instruction-focused heuristic-bas | arXiv: 2502.18746
- a survey of large language models in psychotherapy current landscape and future | arXiv: 2502.11095
- a survey of llm-based agents in medicine how far are we from baymax | arXiv: 2502.11211
- a survey of post-training scaling in large language models
- A Survey on Efficient Large Language Model Training: From Data-centric Perspectives | arXiv: 2510.25817
- A Survey on Foundation Language Models for Single-cell Biology
- A Survey on Patent Analysis: From NLP to Multimodal AI | arXiv: 2404.08668
- a survey on proactive defense strategies against misinformation in large languag | arXiv: 2507.05288
- a systematic study of compositional syntactic transformer language models | arXiv: 2506.22978
- a text is worth several tokens text embedding from llms secretly aligns well wit | arXiv: 2406.17378
- A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive | arXiv: 2402.11005
- a training-free llm-based approach to general chinese character error correction | arXiv: 2502.15266
- a triple-view framework for fine-grained emotion classification with clustering-
- A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns | arXiv: 2410.16155
- a unified agentic framework for evaluating conditional image generation | arXiv: 2504.07046
- a variational approach for mitigating entity bias in relation extraction | arXiv: 2506.11381
- a-tasc asian ted-based automatic subtitling corpus
- aad-llm neural attention-driven auditory scene understanding | arXiv: 2502.16794
- AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research | arXiv: 2507.13300
- accelerating adaptive retrieval augmented generation via instruction-driven repr | arXiv: 2505.12731
- accelerating dense llms via l0-regularized mixture-of-experts
- access denied inc the first benchmark environment for sensitivity awareness | arXiv: 2506.00964
- accurate kv cache quantization with outlier tokens tracing | arXiv: 2505.10938
- AceCoder: Acing Coder RL via Automated Test-Case Synthesis | arXiv: 2502.01718
- acord an expert-annotated retrieval dataset for legal contract drafting | arXiv: 2501.06582
- acoustic individual identification of white-faced capuchin monkeys using joint m
- acquisition and application of novel knowledge in large language models
- act knowledgeable agents to design and perform complex tasks
- activating distributed visual region within llms for efficient and effective vis
- activation steering decoding mitigating hallucination in large vision-language m
- actiview evaluating active perception ability for multimodal large language mode
- ad-hoc concept forming in the game codenames as a means for evaluating large lan | arXiv: 2502.11707
- ad-llm benchmarking large language models for anomaly detection | arXiv: 2412.11142
- adadhp fine-grained fine-tuning via dual hadamard product and adaptive parameter
- adaedit advancing continuous knowledge editing for large language models
- adammeme adaptively probe the reasoning capacity of multimodal large language mo | arXiv: 2507.01702
- adaptagent adapting multimodal web agents with few-shot learning from human demo
- adapting psycholinguistic research for llms gender-inclusive language in a coref | arXiv: 2502.13120
- adaptive and robust translation from natural language to multi-model query langu
- adaptive detoxification safeguarding general capabilities of llms through toxici | arXiv: 2505.22298
- adaptive linguistic prompting alp enhances phishing webpage detection in multimo | arXiv: 2507.13357
- adaptive retrieval without self-knowledge bringing uncertainty back home | arXiv: 2501.12835
- Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger | arXiv: 2502.12961
- adaptive-vp a framework for llm-based virtual patients that adapts to trainees d | arXiv: 2506.00386
- addressing blind guessing calibration of selection bias in multiple-choice quest
- advancing collaborative debates with role differentiation through multi-agent re
- advancing sequential numerical prediction in autoregressive models | arXiv: 2505.13077
- advancing smoe for continuous domain adaptation of mllms adaptive router and dom
- advancing zero-shot text-to-speech intelligibility across diverse domains via pr | arXiv: 2505.04113
- adversarial alignment with anchor dragging drift a3d2 multimodal domain adaptati
- adversarial tokenization | arXiv: 2503.02174
- adverse event extraction from discharge summaries a new dataset annotation schem
- AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | arXiv: 2411.15640
- afrobench how good are large language models on african languages | arXiv: 2311.07978
- afrocs-xs creating a compact high-quality human-validated code-switched dataset
- agd adversarial game defense against jailbreak attacks in large language models
- Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents | arXiv: 2506.21252
- agentalign navigating safety alignment in the shift from informative to agentic | arXiv: 2505.23020
- agentdropout dynamic agent elimination for token-efficient and high-performance
- agentgym evaluating and training large language model-based agents across divers
- agentic knowledgeable self-awareness | arXiv: 2504.03553
- Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools | arXiv: 2502.04644
- agentic reward modeling integrating human preferences with verifiable correctnes
- agentrm enhancing agent generalization with reward modeling | arXiv: 2502.18407
- agents under siege breaking pragmatic multi-agent llm systems with optimized pro
- agrail a lifelong agent guardrail with effective and adaptive safety detection | arXiv: 2502.11448
- agri-cm3 a chinese massive multi-modal multi-level benchmark for agricultural un
- ai4reading chinese audiobook interpretation system based on multi-agent collabor | arXiv: 2512.23300
- aide attribute-guided multi-hop data expansion for data scarcity in task-specifi | arXiv: 2412.06136
- AIMSCheck: Leveraging LLMs for AI-Assisted Review of Modern Slavery Statements Across Jurisdictions | arXiv: 2506.01671
- air-bench automated heterogeneous information retrieval benchmark | arXiv: 2412.13102
- akan cinematic emotions ace a multimodal multi-party dataset for emotion recogni | arXiv: 2502.10973
- algen few-shot inversion attacks on textual embeddings via cross-model alignment
- Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study | arXiv: 2412.13169
- align-slm textless spoken language models with reinforcement learning from ai fe | arXiv: 2411.01834
- AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | arXiv: 2503.02832
- Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race | arXiv: 2506.00253
- Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race | arXiv: 2506.00253
- Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review | arXiv: 2412.18043
- aligning large language models to follow instructions and hallucinate less via e | arXiv: 2502.07340
- Aligning Large Language Models with Implicit Preferences from User-Generated Content | arXiv: 2506.04463
- Aligning VLM Assistants with Personalized Situated Cognition | arXiv: 2506.00930
- alignment drift in cefr-prompted llms for interactive spanish tutoring | arXiv: 2505.08351
- alignmmbench evaluating chinese multimodal alignment in large vision-language mo | arXiv: 2406.09295
- All That Glitters is Not Novel: Plagiarism in AI Generated Research | arXiv: 2502.16487
- alleviating distribution shift in synthetic data for machine translation quality | arXiv: 2502.19941
- alleviating hallucinations from knowledge misalignment in large language models
- ambik dataset of ambiguous tasks in kitchen environment | arXiv: 2506.04089
- amopo adaptive multi-objective preference optimization without reward models and | arXiv: 2506.07165
- amplifying trans and nonbinary voices a community-centred harm taxonomy for llms
- an analysis of datasets metrics and models in keyphrase generation | arXiv: 2506.10346
- An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling | arXiv: 2402.13534
- an efficient and precise training data construction framework for process-superv
- An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals | arXiv: 2506.03519
- an empirical study of iterative refinements for non-autoregressive translation
- An Empirical Study of Many-to-Many Summarization with Large Language Models | arXiv: 2505.12983
- an expanded massive multilingual dataset for high-performance language technolog | arXiv: 2503.10267
- analytickws towards exemplar-free analytic class incremental learning for small- | arXiv: 2505.11817
- analyzing and mitigating inconsistency in discrete speech tokens for neural code
- Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations | arXiv: 2504.13816
- analyzing political bias in llms via target-oriented sentiment classification | arXiv: 2505.19776
- analyzing the rapid generalization of sft via the perspective of attention head
- anchored answers unravelling positional bias in gpt-2s multiple-choice questions | arXiv: 2405.03205
- AndroidGen: Building an Android Language Agent under Data Scarcity | arXiv: 2504.19298
- AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | arXiv: 2410.24024
- anre analogical replay for temporal knowledge graph forecasting
- answer when needed forget when not language models pretend to forget via in-cont | arXiv: 2410.00382
- answering complex geographic questions by adaptive reasoning with visual context
- antileakbench preventing data contamination by automatically constructing benchm | arXiv: 2412.13670
- any information is just worth one single screenshot unifying search with visuali
- anything goes a crosslinguistic study of impossible language learning in lms | arXiv: 2502.18795
- APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs | arXiv: 2502.12085
- appl a prompt programming language for harmonious integration of programs and la
- are any-to-any models more consistent across modality transfers than specialists | arXiv: 2505.24211
- are bias evaluation methods biased | arXiv: 2506.17111
- are llms effective psychological assessors leveraging adaptive rag for interpret
- are optimal algorithms still optimal rethinking sorting in llm-based pairwise ra
- are rules meant to be broken understanding multilingual moral reasoning as a com | arXiv: 2502.14083
- are the hidden states hiding something testing the limits of factuality-encoding
- Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media | arXiv: 2412.18148
- are your llms capable of stable reasoning | arXiv: 2412.13147
- arghitz at archehr-qa 2025 a two-step divide and conquer approach to patient que | arXiv: 2506.12886
- aria-ui visual grounding for gui instructions | arXiv: 2412.16256
- ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search | arXiv: 2504.10893
- Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework | arXiv: 2412.16953
- arithmattack evaluating robustness of llms to noisy context in math problem solv | arXiv: 2501.08203
- around the world in 24 hours probing llm knowledge of time and place | arXiv: 2506.03984
- asclepius a spectrum evaluation benchmark for medical multi-modal large language
- ask-before-detection identifying and mitigating conformity bias in llm-powered e
- askqe question answering as automatic evaluation for machine translation | arXiv: 2504.11582
- aspera a simulated environment to evaluate planning for complex action execution | arXiv: 2507.15501
- aspo adaptive sentence-level preference optimization for fine-grained multimodal | arXiv: 2505.19100
- assessing agentic large language models in multilingual national bias | arXiv: 2502.17945
- Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | arXiv: 2410.11005
- assessing reliability and political bias in llms judgements of formal and materi
- assessment and manipulation of latent constructs in pre-trained language models
- assigning distinct roles to quantized and low-rank matrices toward optimal weigh | arXiv: 2506.02077
- Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | arXiv: 2410.07176
- atgen a framework for active text generation | arXiv: 2506.23342
- atlantis weak-to-strong learning via importance sampling
- atomic calibration of llms in long-form generations | arXiv: 2410.13246
- atri mitigating multilingual audio text retrieval inconsistencies by reducing da
- Attacking Vision-Language Computer Agents via Pop-ups | arXiv: 2411.02391
- Attention Entropy is a Key Factor for Parallel Context Encoding | arXiv: 2412.16545
- attention speaks volumes localizing and mitigating bias in language models | arXiv: 2410.22517
- atyaephyra at semeval-2025 task 4 low-rank negative preference optimization | arXiv: 2503.13690
- autalic a dataset for anti-autistic ableist language in context | arXiv: 2410.16520
- auto-arena automating llm evaluations with agent peer battles and committee disc
- auto-ta towards scalable automated thematic analysis ta via multi-agent large la | arXiv: 2506.23998
- AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs | arXiv: 2502.01977
- Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models | arXiv: 2505.19490
- automated structured radiology report generation | arXiv: 2505.24223
- automatic detection of dyslexia based on eye movements during reading in russian
- automatic evaluation for text-to-image generation task-decomposed framework dist
- automatic expert discovery in llm upcycling via sparse interpolated mixture-of-e
- automatic generation of inference making questions for reading comprehension ass | arXiv: 2506.08260
- automatic transmission for llm tiers optimizing cost and accuracy in large langu | arXiv: 2505.20921
- Automating Legal Interpretation with LLMs: Retrieval, Generation, and Evaluation | arXiv: 2501.01743
- automedeval harnessing language models for automatic medical capability evaluati
- AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs | arXiv: 2506.00569
- automixer checkpoint artifacts as automatic data mixers | arXiv: 2506.21910
- autonomous data selection with zero-shot generative classifiers for mathematical | arXiv: 2402.07625
- Autoregressive Speech Synthesis without Vector Quantization | arXiv: 2407.08551
- avg-llava an efficient large multimodal model with adaptive visual granularity | arXiv: 2410.02745
- awes laws and flaws from todays llm research | arXiv: 2408.15409
- axis efficient human-agent-computer interaction with api-first llm-based agents | arXiv: 2409.17140
- balancing diversity and risk in llm sampling how to select your method and param
- balancing the budget understanding trade-offs between supervised and preference- | arXiv: 2502.11284
- bandit-based prompt design strategy selection improves prompt optimizers | arXiv: 2503.01163
- banstereoset a dataset to measure stereotypical social biases in llms for bangla | arXiv: 2409.11638
- basic reading distillation | arXiv: 2507.19741
- batayan a filipino nlp benchmark for evaluating large language models | arXiv: 2502.14911
- battling against tough resister strategy planning with adversarial game for non-
- BeamLoRA: Beam-Constraint Low-Rank Adaptation | arXiv: 2502.13604
- behavioral analysis of information salience in large language models | arXiv: 2502.14613
- behaviorbox automated discovery of fine-grained performance differences between | arXiv: 2506.02204
- behavioural vs representational systematicity in end-to-end models an opinionate | arXiv: 2506.04461
- Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset for Polish Erotic Discourse | arXiv: 2412.17533
- bel esprit multi-agent framework for building ai model pipelines | arXiv: 2412.14684
- BelarusianGLUE: Towards a Natural Language Understanding Benchmark for Belarusian
- belle a bi-level multi-agent reasoning framework for multi-hop question answerin | arXiv: 2505.11811
- benchmarking and improving large vision-language models for fundamental visual g
- benchmarking llms and llm-based agents in practical vulnerability detection for | arXiv: 2503.03586
- benchmarking long-context language models on long code understanding | arXiv: 2503.04359
- Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models | arXiv: 2412.05167
- benchmarking uncertainty quantification methods for large language models with l | arXiv: 2406.15627
- bert-like models for slavic morpheme segmentation
- besstie a benchmark for sentiment and sarcasm classification for varieties of en | arXiv: 2412.04726
- better embeddings with coupled adam | arXiv: 2502.08441
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases | arXiv: 2502.19249
- Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs | arXiv: 2502.20968
- beyond completion a foundation model for general knowledge graph reasoning | arXiv: 2505.21926
- beyond demographics fine-tuning large language models to predict individuals sub
- beyond dialogue a profile-dialogue alignment framework towards general role-play
- Beyond Facts: Evaluating Intent Hallucination in Large Language Models | arXiv: 2506.06539
- Beyond Frameworks: Unpacking Collaboration Strategies in Multi-Agent Systems | arXiv: 2505.12467
- beyond in-context learning aligning long-form generation of large language model | arXiv: 2506.01265
- beyond logits aligning feature dynamics for effective knowledge distillation
- beyond n-grams rethinking evaluation metrics and strategies for multilingual abs | arXiv: 2507.08342
- beyond negative stereotypes -- non-negative abusive utterances about identity gr
- beyond numeric rewards in-context dueling bandits with llm agents | arXiv: 2407.01887
- beyond one-size-fits-all tailored benchmarks for efficient evaluation | arXiv: 2502.13576
- beyond output matching bidirectional alignment for enhanced in-context learning | arXiv: 2312.17055
- beyond position the emergence of wavelet-like properties in transformers | arXiv: 2410.18067
- beyond profile from surface-level facts to deep persona simulation in llms | arXiv: 2502.12988
- beyond prompt engineering robust behavior control in llms via steering target at | arXiv: 2505.20322
- Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering | arXiv: 2503.01606
- beyond sequences two-dimensional representation and dependency encoding for code
- beyond similarity a gradient-based graph method for instruction tuning data sele
- beyond single labels improving conversational recommendation through llm-powered | arXiv: 2508.05657
- beyond surface simplicity revealing hidden reasoning attributes for precise comm
- beyond surface-level patterns an essence-driven defense framework against jailbr | arXiv: 2502.19041
- Beyond Text Compression: Evaluating Tokenizers Across Scales | arXiv: 2506.03101
- beyond the answer advancing multi-hop qa with fine-grained graph reasoning and e
- beyond the tip of efficiency uncovering the submerged threats of jailbreak attac | arXiv: 2502.19883
- beyond true or false retrieval-augmented hierarchical analysis of nuanced claims | arXiv: 2506.10728
- bfs-prover scalable best-first tree search for llm-based automatic theorem provi
- bi-tuning with collaborative information for controllable llm-based sequential r
- bias attribution in filipino language models extending a bias interpretability m | arXiv: 2506.07249
- Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation | arXiv: 2402.12649
- bias in the mirror are llms opinions robust to their own adversarial attacks
- biased llms can influence political decision-making
- biasguard a reasoning-enhanced bias detection tool for large language models | arXiv: 2504.21299
- big-bench extra hard | arXiv: 2502.19187
- big5-chat shaping llm personalities through training on human-grounded data | arXiv: 2410.16491
- bilingual zero-shot stance detection
- Binary Classifier Optimization for Large Language Model Alignment | arXiv: 2404.04656
- bipro zero-shot chinese poem generation via block inverse prompting constrained | arXiv: 2411.13237
- bitnetcpp efficient edge inference for ternary llms
- blessing of multilinguality a systematic analysis of multilingual in-context lea | arXiv: 2502.11364
- blockpruner fine-grained pruning for large language models | arXiv: 2406.10594
- bmike-53 investigating cross-lingual knowledge editing with in-context learning | arXiv: 2406.17764
- Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation | arXiv: 2502.10762
- BookCoref: Coreference Resolution at Book Scale | arXiv: 2507.12075
- bookworld from novels to interactive agent societies for story creation
- boosting llms molecular structure elucidation with knowledge enhanced tree searc | arXiv: 2506.23056
- boosting long-context information seeking via query-guided activation refilling
- boosting vulnerability detection of llms via curriculum preference optimization | arXiv: 2506.07390
- bpp-search enhancing tree of thought reasoning for mathematical modeling problem | arXiv: 2411.17404
- bqa body language question answering dataset for video large language models | arXiv: 2410.13206
- brainecho semantic brain signal decoding through vector-quantized spectrogram re | arXiv: 2410.14971
- breaking the ceiling exploring the potential of jailbreak attacks through expand | arXiv: 2505.21277
- bregman conditional random fields sequence labeling with parallelizable inferenc | arXiv: 2506.00732
- brevity is the soul of sustainability characterizing llm response lengths | arXiv: 2506.08686
- bridging the language gaps in large language models with inference-time cross-li
- brighter bridging the gap in human-annotated textual emotion recognition dataset
- browsing like human a multimodal web agent with experiential fast-and-slow think
- browsing lost unformed recollections a benchmark for tip-of-the-tongue search an | arXiv: 2503.19193
- building a long text privacy policy corpus with multi-class labels
- building better avoiding pitfalls in developing language resources when data is | arXiv: 2410.12691
- burn after reading do multimodal large language models truly capture order of ev | arXiv: 2506.10415
- bypass back-propagation optimization-based structural pruning for large language
- Byte Latent Transformer: Patches Scale Better Than Tokens | arXiv: 2412.09871
- c2leva toward comprehensive and contamination-free language model evaluation | arXiv: 2412.04947
- cadreview automatically reviewing cad programs with error detection and correcti | arXiv: 2505.22304
- calibraeval calibrating prediction distribution to mitigate selection bias in ll
- call for rigor in reporting quality of instruction tuning data | arXiv: 2503.04807
- CaLMQA: Exploring Culturally Specific Long-Form Question Answering across 23 Languages | arXiv: 2406.17761
- cami a counselor agent supporting motivational interviewing through state infere
- can a single model master both multi-turn conversations and tool use coalm a uni
- can community notes replace professional fact-checkers | arXiv: 2502.14132
- can external validation tools improve annotation quality for llm-as-a-judge | arXiv: 2507.17015
- Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? | arXiv: 2402.07140
- Can Indirect Prompt Injection Attacks Be Detected and Removed? | arXiv: 2502.16580
- can input attributions explain inductive reasoning in in-context learning | arXiv: 2412.15628
- Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering | arXiv: 2410.08085
- can language models reason about individualistic human values and preferences | arXiv: 2410.03868
- can language models replace programmers for coding repocod says not yet | arXiv: 2410.21647
- can large language models accurately generate answer keys for health-related que
- can large language models address open-target stance detection | arXiv: 2409.00222
- can large language models detect errors in long chain-of-thought reasoning | arXiv: 2502.19361
- Can Large Language Models Understand Internet Buzzwords Through User-Generated Content | arXiv: 2505.15071
- Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? | arXiv: 2502.11598
- Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates | arXiv: 2505.22943
- Can LLMs Evaluate Complex Attribution in QA? Automatic Benchmarking using Knowledge Graphs | arXiv: 2401.14640
- Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval | arXiv: 2506.12278
- can llms ground when they dont know a study on direct and loaded political quest
- can llms help uncover insights about llms a large-scale evolving literature anal | arXiv: 2502.18791
- can llms identify critical limitations within scientific research a systematic e
- can llms interpret and leverage structured linguistic representations a case stu | arXiv: 2504.04745
- can llms reason about program semantics a comprehensive evaluation of llms on fo | arXiv: 2503.04779
- can llms reliably simulate real students abilities in mathematics and reading co | arXiv: 2507.08232
- can llms simulate l2-english dialogue an information-theoretic analysis of l1-de
- can llms understand unvoiced speech exploring emg-to-text conversion with llms | arXiv: 2506.00304
- can mllms understand the deep implication behind chinese images | arXiv: 2410.13854
- can multimodal foundation models understand schematic diagrams an empirical stud | arXiv: 2507.10787
- Can Multimodal Large Language Models Understand Spatial Relations? | arXiv: 2505.19015
- can third parties read our emotions
- can uniform meaning representation help gpt-4 translate from indigenous language | arXiv: 2502.08900
- can vision language models understand mimed actions | arXiv: 2506.21586
- can vision-language models evaluate handwritten math | arXiv: 2501.07244
- can we further elicit reasoning in llms critic-guided planning with retrieval-au
- can we retrieve everything all at once arm an alignment-oriented llm-based retri
- can you really trust code copilot evaluating large language models from a code s
- can you share your story modeling clients metacognition and openness for llm the | arXiv: 2507.19643
- capability salience vector fine-grained alignment of loss and capabilities for d
- capacity matters a proof-of-concept for transformer memorization on real-world d | arXiv: 2506.14704
- Capture the Key in Reasoning to Enhance CoT Distillation Generalization | arXiv: 2405.19737
- capturing author self beliefs in social media language
- cart a generative cross-modal retrieval framework with coarse-to-fine semantic m | arXiv: 2406.17507
- Causal Estimation of Tokenisation Bias | arXiv: 2506.03149
- causal graph based event reasoning using semantic relation experts | arXiv: 2506.06910
- causalrag integrating causal graphs into retrieval-augmented generation | arXiv: 2503.19878
- Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions | arXiv: 2408.02544
- cautious next token prediction | arXiv: 2507.03038
- cavgan unifying jailbreak and defense of llms via generative adversarial attacks | arXiv: 2507.06043
- cc-tuning a cross-lingual connection mechanism for improving joint multilingual | arXiv: 2506.00875
- cchall a novel benchmark for joint cross-lingual and cross-modal hallucinations | arXiv: 2505.19108
- ceaes bidirectional reinforcement learning optimization for consistent and expla
- CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference | arXiv: 2412.10652
- Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model | arXiv: 2501.05122
- CER: Confidence Enhanced Reasoning in LLMs | arXiv: 2502.14634
- cfbench a comprehensive constraints-following benchmark for llms | arXiv: 2408.01122
- chain-of-jailbreak attack for image generation models via editing step by step | arXiv: 2410.03869
- chain-of-reasoning towards unified mathematical reasoning in large language mode | arXiv: 2501.11110
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | arXiv: 2501.11110
- chain-talker chain understanding and rendering for empathetic conversational spe | arXiv: 2505.12597
- ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains | arXiv: 2507.08427
- ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | arXiv: 2501.06598
- chartlens fine-grained visual attribution in charts | arXiv: 2505.19360
- chatbench from static benchmarks to human-ai evaluation | arXiv: 2504.07114
- chatsop an sop-guided mcts planning framework for controllable llm dialogue agen
- Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch | arXiv: 2502.17173
- cheer-ekman fine-grained embodied emotion classification | arXiv: 2506.01047
- chemactor enhancing automated extraction of chemical synthesis actions with llm- | arXiv: 2506.23520
- CheXalign: Preference Fine-tuning in Chest X-ray Interpretation Models without Human Feedback | arXiv: 2410.07025
- childmandarin a comprehensive mandarin speech dataset for young children aged 3- | arXiv: 2409.18584
- chinese inertial gan for handwriting signal generation and recognition
- chinese safetyqa a safety short-form factuality benchmark for large language mod
- chinese simpleqa a chinese factuality evaluation for large language models | arXiv: 2411.07140
- chronosense exploring temporal understanding in large language models with time | arXiv: 2501.03040
- chulo chunk-level key information representation for long document understanding | arXiv: 2410.11119
- Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models | arXiv: 2410.01434
- circuit stability characterizes language model generalization | arXiv: 2505.24731
- citeeval principle-driven citation evaluation for source attribution | arXiv: 2506.01829
- citynavagent aerial vision-and-language navigation with hierarchical semantic pl
- CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs | arXiv: 2409.05806
- clac at semeval-2025 task 6 a multi-architecture approach for corporate environm | arXiv: 2505.23538
- claim mitigating multilingual object hallucination in large vision-language mode
- claimpkg enhancing claim verification via pseudo-subgraph generation with lightw | arXiv: 2505.22552
- clamp 3 universal music information retrieval across unaligned modalities and un | arXiv: 2502.10362
- CLaSp: In-Context Layer Skip for Self-Speculative Decoding | arXiv: 2505.24196
- class distillation with mahalanobis contrast an efficient training paradigm for
- Classifying Unreliable Narrators with Large Language Models | arXiv: 2506.10231
- CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction | arXiv: 2407.00934
- clinidial a naturally occurring multimodal dialogue dataset for team reflection | arXiv: 2506.12936
- cliperase efficient unlearning of visual-textual associations in clip | arXiv: 2410.23330
- clix cross-lingual explanations of idiomatic expressions | arXiv: 2501.03191
- clozemath improving mathematical reasoning in language models by learning to fil | arXiv: 2506.03763
- clusterattn kv cache compression under intrinsic attention clustering
- cmhkf cross-modality heterogeneous knowledge fusion for weakly supervised video
- cnnsum exploring long-context summarization with large language models in chines | arXiv: 2412.02819
- CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model | arXiv: 2509.11698
- coam corpus of all-type multiword expressions | arXiv: 2412.18151
- coco-bench a comprehensive code benchmark for multi-task large language model ev | arXiv: 2504.20673
- CoCoLex: Confidence-guided Copy-based Decoding for Grounded Legal Text Generation | arXiv: 2508.05534
- code-switching and syntax a large-scale experiment | arXiv: 2506.01846
- code-switching curriculum learning for multilingual transfer in llms | arXiv: 2411.02460
- code-switching red-teaming llm evaluation for safety and multilingual understand | arXiv: 2406.15481
- CodeDPO: Aligning Code Models with Self Generated and Verified Source Code | arXiv: 2410.05605
- codeif benchmarking the instruction-following capabilities of large language mod | arXiv: 2502.19166
- codemenv benchmarking large language models on code migration | arXiv: 2506.00894
- codereviewqa the code review comprehension assessment for large language models | arXiv: 2503.16167
- codetool enhancing programmatic tool invocation of llms via process supervision | arXiv: 2503.20840
- coe a clue of emotion framework for emotion recognition in conversations
- CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models | arXiv: 2505.20767
- cogsteer cognition-inspired selective layer intervention for efficiently steerin | arXiv: 2410.17714
- coir a comprehensive benchmark for code information retrieval models | arXiv: 2407.02883
- cola collaborative low-rank adaptation | arXiv: 2505.15471
- coling-unia at scivqa 2025 few-shot example retrieval and confidence-informed en | arXiv: 2507.02357
- Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence | arXiv: 2503.05037
- colloquial singaporean english style transfer with fine-grained explainable cont
- Com2: A Causal-Guided Benchmark for Complex Commonsense Reasoning | arXiv: 2506.07064
- combining domain and alignment vectors provides better knowledge-safety trade-of
- combining the best of both worlds a method for hybrid nmt and llm translation | arXiv: 2505.13554
- comet metaphor-driven covert communication for multi-agent language games | arXiv: 2505.18218
- comfyui-copilot an intelligent assistant for automated workflow development | arXiv: 2506.05010
- Commonsense Reasoning in Arab Culture | arXiv: 2502.12788
- Comparing LLM-generated and human-authored news text using formal syntactic theory | arXiv: 2506.01407
- Comparing Moral Values in Western English-speaking Societies and LLMs with Word Associations | arXiv: 2505.19674
- comparison-based active preference learning for multi-dimensional personalizatio
- comparisonqa evaluating factuality robustness of llms through knowledge frequenc | arXiv: 2412.20251
- compileagent automated real-world repo-level compilation with tool-integrated ll | arXiv: 2505.04254
- compke complex question answering under knowledge editing | arXiv: 2506.00829
- Completing A Systematic Review in Hours instead of Months with Interactive AI Agents | arXiv: 2504.14822
- computation mechanism behind llm position generalization | arXiv: 2503.13305
- comrag retrieval-augmented generation with dynamic vector stores for real-time c | arXiv: 2506.21098
- con instruction universal jailbreaking of multimodal large language models via n | arXiv: 2506.00548
- conceptcarve dynamic realization of evidence | arXiv: 2504.07228
- conditional dichotomy quantification via geometric embedding
- condor enhance llm alignment with knowledge-driven data synthesis and refinement | arXiv: 2501.12273
- conect dataset overcoming data scarcity in context-aware e-commerce mt | arXiv: 2506.04929
- confetti conversational function-calling evaluation through turn-level interacti | arXiv: 2506.01859
- confidence vs critique a decomposition of self-correction capability for llms
- conformity in large language models | arXiv: 2410.12428
- conloan a contrastive multilingual dataset for evaluating loanwords
- consim measuring concept-based explanations effectiveness with automated simulat | arXiv: 2501.05855
- ConsistencyChecker: Tree-based Evaluation of LLM Generalization Capabilities | arXiv: 2506.12376
- consistent client simulation for motivational interviewing-based counseling | arXiv: 2502.02802
- conspiracy theories and where to find them on tiktok | arXiv: 2407.12545
- consultant decoding yet another synergistic mechanism | arXiv: 2506.02391
- context-aware hierarchical merging for long document summarization | arXiv: 2502.00977
- Context-Aware Sentiment Forecasting via LLM-based Multi-Perspective Role-Playing Agents | arXiv: 2505.24331
- context-robust knowledge editing for language models | arXiv: 2505.23026
- contextual experience replay for self-improvement of language agents | arXiv: 2506.06698
- Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing | arXiv: 2505.20976
- Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models | arXiv: 2401.08491
- Contrastive Prompting Enhances Sentence Embeddings in LLMs through Inference-Time Steering | arXiv: 2505.12831
- Controllable and Reliable Knowledge-Intensive Task-Oriented Conversational Agents with Declarative Genie Worksheets | arXiv: 2407.05674
- controllable style arithmetic with language models
- controlled low-rank adaptation with subspace regularization for continued traini
- ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control | arXiv: 2406.01205
- convert language model into a value-based strategic planner | arXiv: 2505.06987
- cool-fusion fuse large language models without training | arXiv: 2407.19807
- cooperative or competitive understanding the interaction between attention heads
- coordinating chaos a structured review of linguistic coordination methodologies
- CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | arXiv: 2502.16880
- cordial can multimodal large language models effectively understand coherence re
- CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG | arXiv: 2506.02544
- coreeval automatically building contamination-resilient datasets with real-world
- coreference as an indicator of context scope in multimodal narrative | arXiv: 2503.05298
- coret improved retriever for code editing | arXiv: 2505.24715
- correcting hallucinations in news summaries exploration of self-correcting llm m | arXiv: 2506.19607
- cortexdebate debating sparsely and equally for multi-agent debate | arXiv: 2507.03928
- cosmic generalized refusal direction identification in llm activations | arXiv: 2506.00085
- COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation | arXiv: 2506.15372
- cosyn code guided synthetic data
- cot-based synthesizer enhancing llm performance through answer synthesis | arXiv: 2501.01668
- cot-icl lab a synthetic framework for studying chain-of-thought learning from in | arXiv: 2502.15132
- cot-uq improving response-wise uncertainty quantification in llms with chain-of- | arXiv: 2502.17214
- cot-valve length-compressible chain-of-thought tuning | arXiv: 2502.09601
- counterfactual-consistency prompting for relative temporal understanding in larg | arXiv: 2502.11425
- Counterspeech the Ultimate Shield! Multi-Conditioned Counterspeech Generation through Attributed Prefix Learning | arXiv: 2505.11958
- cove compressed vocabulary expansion makes better llm-based recommender systems | arXiv: 2506.19993
- crab a novel configurable role-playing llm with assessing benchmark
- cracking factual knowledge a comprehensive analysis of degenerate knowledge neur
- Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence | arXiv: 2412.13949
- craftext benchmark advancing instruction following in complex multimodal open-en
- Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity | arXiv: 2502.13063
- crisists coupling social media textual data and meteorological time series for u
- criskeval a chinese multi-level risk evaluation benchmark dataset for large lang
- critic-cot boosting the reasoning abilities of large language model via chain-of | arXiv: 2408.16326
- critiq mining data quality criteria from human preferences | arXiv: 2502.19279
- Croppable Knowledge Graph Embedding | arXiv: 2407.02779
- cross-document contextual coreference resolution in knowledge graphs | arXiv: 2504.05767
- cross-lingual auto evaluation for assessing multilingual llms | arXiv: 2410.13394
- Cross-Lingual Generalization and Compression: From Language-Specific to Shared Neurons | arXiv: 2506.01629
- cross-lingual optimization for language transfer in large language models | arXiv: 2505.14297
- Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models | arXiv: 2505.18673
- cross-lingual representation alignment through contrastive image-caption tuning | arXiv: 2505.13628
- cross-lingual transfer of cultural knowledge an asymmetric phenomenon | arXiv: 2506.01675
- cross-lingual transfer of debiasing and detoxification in multilingual llms an e | arXiv: 2412.14050
- Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts | arXiv: 2501.02009
- crowd comparative reasoning unlocking comprehensive evaluations for llm-as-a-jud
- crowdsource crawl or generate creating sea-vl a multicultural vision-language da
- cruxeval-x a benchmark for multilingual code reasoning understanding and executi | arXiv: 2408.13001
- cstree-sri introspection-driven cognitive semantic tree for multi-turn question
- cstrl context-driven sequential transfer learning for abstractive radiology repo | arXiv: 2503.05750
- ctpd cross-modal temporal pattern discovery for enhanced multimodal electronic h | arXiv: 2411.00696
- cu-mam coherence-driven unified macro-structures for argument mining
- cuckoo an ie free rider hatched by massive nutrition in llms nest | arXiv: 2502.11275
- culemo cultural lenses on emotion - benchmarking llms for cross-cultural emotion | arXiv: 2503.10688
- culfit a fine-grained cultural-aware llm training paradigm via multilingual crit | arXiv: 2505.19484
- cultivating gaming sense for yourself making vlms gaming experts | arXiv: 2503.21263
- cultural learning-based culture adaptation of language models | arXiv: 2504.02953
- culturalbench a robust diverse and challenging benchmark for measuring lms cultu
- culturalbench a robust diverse and challenging cultural benchmark by human-ai cu | arXiv: 2410.02677
- culture is not trivia sociocultural theory for cultural nlp | arXiv: 2502.12057
- culture matters in toxic language detection in persian | arXiv: 2506.03458
- Curiosity-Driven Reinforcement Learning from Human Feedback | arXiv: 2501.11463
- curriculum debiasing toward robust parameter-efficient fine-tuning against datas
- cxggec construction-guided grammatical error correction
- cypherbench towards precise retrieval over full-scale modern knowledge graphs in
- d-gen automatic distractor generation and evaluation for reliable assessment of | arXiv: 2504.13439
- DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression | arXiv: 2507.11942
- dalr dual-level alignment learning for multimodal sentence representation learni | arXiv: 2506.21096
- dape v2 process attention score as feature map for length extrapolation | arXiv: 2410.04798
- dars dynamic action re-sampling to enhance coding agent performance by adaptive | arXiv: 2503.14269
- data caricatures on the representation of african american language in pretraini | arXiv: 2503.10789
- data laundering artificially boosting benchmark results through knowledge distil | arXiv: 2412.15255
- Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning | arXiv: 2506.17525
- Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning | arXiv: 2505.12212
- data-constrained synthesis of training data for de-identification | arXiv: 2502.14677
- davir data selection via implicit reward for large language models | arXiv: 2310.13008
- dcg-sql enhancing in-context learning for text-to-sql with deep contextual schem
- ddxtutor clinical reasoning tutoring system with differential diagnosis-based st
- DeAL: Decoding-time Alignment for Large Language Models | arXiv: 2402.06147
- debate reflect and distill multi-agent feedback with tree-structured preference | arXiv: 2506.03541
- debatecoder towards collective intelligence of llms via test case driven llm deb
- debiasing the fine-grained classification task in llms with bias-aware peft
- decoder-only llms can be masked auto-encoders
- decoding by contrasting knowledge enhancing large language model confidence on e
- decoding knowledge attribution in mixture-of-experts a framework of basic-refine | arXiv: 2505.24593
- decoding on graphs faithful and sound reasoning on knowledge graphs through gene
- decoding reading goals from eye movements | arXiv: 2410.20779
- decomposed opinion summarization with verified aspect-aware modules | arXiv: 2501.17191
- deep temporal reasoning in video language models a cross-linguistic evaluation o
- deeper insight into your user directed persona refinement for dynamic persona mo
- deepreview improving llm-based paper review with human-like deep thinking proces
- deeprtl2 a versatile model for rtl-related tasks | arXiv: 2506.15697
- deepsolution boosting complex engineering solution design via tree-based explora | arXiv: 2502.20730
- def-dts deductive reasoning for open-domain dialogue topic segmentation | arXiv: 2505.21033
- Defense Against Prompt Injection Attack by Leveraging Attack Techniques | arXiv: 2411.00459
- define decision-making with analogical reasoning over factor profiles | arXiv: 2410.01772
- defining and evaluating visual language models basic spatial abilities a perspec
- Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems | arXiv: 2502.14019
- deja vu decoding repeated reading from eye movements
- deliberate reasoning in language models as structure-aware planning with an accu
- delta-knn improving demonstration selection in in-context learning for alzheimer
- Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models | arXiv: 2505.19121
- demo reframing dialogue interaction with fine-grained element modeling | arXiv: 2412.04905
- demons in the detail on implementing load balancing loss for training specialize
- demystifying small language models for edge deployment
- denselora dense low-rank adaptation of large language models | arXiv: 2505.23808
- Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models | arXiv: 2506.11068
- design choices for extending the context length of visual language models | arXiv: 2412.12735
- detecting referring expressions in visually grounded dialogue with autoregressiv | arXiv: 2506.21294
- detecting sockpuppetry on wikipedia using meta-learning | arXiv: 2506.10314
- detection of human and machine-authored fake news in urdu | arXiv: 2410.19517
- developmentally-plausible working memory shapes a critical period for language a | arXiv: 2502.04795
- dialectal coverage and generalization in arabic speech recognition | arXiv: 2411.05872
- dialogue systems for emotional support via value reinforcement | arXiv: 2501.17182
- dialogue-rag enhancing retrieval for llms via node-linking utterance rewriting
- dialup modeling the language continuum by adapting models to dialects and dialec
- dice-bench evaluating the tool-use capabilities of large language models in mult | arXiv: 2506.22853
- dictionaries to the rescue cross-lingual vocabulary transfer for low-resource la | arXiv: 2506.01535
- Did Translation Models Get More Robust Without Anyone Even Noticing? | arXiv: 2403.03923
- different speech translation models encode and translate speaker gender differen | arXiv: 2506.02172
- difflm controllable synthetic data generation via diffusion language models | arXiv: 2411.03250
- DiffPO: Diffusion Alignment with Direct Preference Optimization | arXiv: 2503.04240
- DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising | arXiv: 2407.00248
- diffusion directed acyclic transformer for non-autoregressive machine translatio
- diffusion models through a global lens are they culturally inclusive
- digest the knowledge large language models empowered message passing for knowled
- digital gatekeepers googles role in curating hashtags and subreddits | arXiv: 2506.14370
- dior adaptive cognitive detection and contextual retrieval optimization for dyna
- direct behavior optimization unlocking the potential of lightweight llms | arXiv: 2506.06401
- direct confidence alignment aligning verbalized confidence with internal confide | arXiv: 2512.11998
- direct prompt optimization with continuous representations
- disambiguate first parse later generating interpretations for ambiguity resoluti | arXiv: 2502.18448
- disambiguating reference in visually grounded dialogues through joint modeling o
- disc plug-and-play decoding intervention with similarity of characters for chine
- disco device-server collaborative llm-based text streaming services | arXiv: 2502.11417
- discourse relation-enhanced neural coherence modeling
- disentangled multi-span evolutionary network against temporal knowledge graph re | arXiv: 2505.14020
- disentangling biased knowledge from reasoning in large language models via machi
- Disentangling Language and Culture for Evaluating Multilingual Large Language Models | arXiv: 2505.24635
- Disentangling Memory and Reasoning Ability in Large Language Models | arXiv: 2411.13504
- disentangling the roles of representation and selection in data pruning | arXiv: 2507.03648
- distance between relevant information pieces causes bias in long-context llms | arXiv: 2410.14641
- distilling an end-to-end voice assistant without instruction training data | arXiv: 2410.02678
- DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts | arXiv: 2506.09351
- diversity explains inference scaling laws through a case study of minimum bayes | arXiv: 2410.15021
- Diversity-oriented Data Augmentation with Large Language Models | arXiv: 2502.11671
- Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation | arXiv: 2501.12432
- Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG | arXiv: 2505.20871
- dnaspeech a contextualized and situated text-to-speech dataset with dialogues na
- dncasr end-to-end training for speaker-attributed asr | arXiv: 2506.01916
- do emotions really affect argument convincingness a dynamic approach with llm-ba | arXiv: 2503.00024
- do language models have semantics on the five standard positions
- do language models mirror human confidence exploring psychological insights to a | arXiv: 2506.00582
- do language models understand honorific systems in javanese | arXiv: 2502.20864
- do language models understand the cognitive tasks given to them investigations w | arXiv: 2412.18120
- Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs | arXiv: 2410.15956
- do large language models perform latent multi-hop reasoning without exploiting s | arXiv: 2411.16679
- do llms give psychometrically plausible responses in educational assessments | arXiv: 2506.09796
- do llms understand dialogues a case study on dialogue acts
- do multimodal large language models truly see what we point at investigating ind
- do not abstain identify and solve the uncertainty | arXiv: 2506.00780
- do vision-language models have internal world models towards an atomic evaluatio | arXiv: 2506.21876
- doc-react multi-page heterogeneous document question-answering
- docagent a multi-agent system for automated code documentation generation | arXiv: 2504.08725
- docmedit towards document-level model editing | arXiv: 2505.19572
- document-level event-argument data augmentation for challenging role types
- Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport | arXiv: 2505.23078
- does context matter contextualjudgebench for evaluating llm-based judges in cont
- does the emotional understanding of lvlms vary under high-stress environments an
- does time have its place temporal heads where language models recall time-specif | arXiv: 2502.14258
- does your voice assistant remember analyzing conversational context recall and u | arXiv: 2502.19759
- dolphin document image parsing via heterogeneous anchor prompting | arXiv: 2505.14059
- dolphin moving towards closed-loop auto-research through thinking practice and f | arXiv: 2501.03916
- DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning | arXiv: 2507.02302
- Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts | arXiv: 2505.24427
- dont erase inform detecting and contextualizing harmful language in cultural her
- dont get lost in the trees streamlining llm reasoning by overcoming tree search
- dont half-listen capturing key-part information in continual instruction tuning
- dont miss the forest for the trees attentional vision calibration for large visi | arXiv: 2405.17820
- dont reinvent the wheel efficient instruction-following text embedding based on | arXiv: 2505.24754
- dont say no jailbreaking llm by suppressing refusal | arXiv: 2404.16369
- double entendre robust audio-based ai-generated lyrics detection via multi-view | arXiv: 2506.15981
- drae dynamic retrieval-augmented expert networks for lifelong learning and task | arXiv: 2507.04661
- DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination | arXiv: 2506.01954
- drama diverse augmentation from large language models to smaller dense retriever | arXiv: 2502.18460
- DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing | arXiv: 2402.16733
- drift enhancing llm faithfulness in rationale generation via dual-reward probabi
- DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization | arXiv: 2411.14055
- drs deep question reformulation with structured output | arXiv: 2411.17993
- drt deep reasoning translation via long chain-of-thought | arXiv: 2412.17498
- ds2-absa dual-stream data synthesis with label refinement for few-shot aspect-ba
- dtcrs dynamic tree construction for recursive summarization | arXiv: 2604.07012
- dualguard a parameter space transformation approach for bidirectional defense in
- dually self-improved counterfactual data augmentation using large language model
- dualrag a dual-process approach to integrate reasoning and retrieval for multi-h
- dva validate your demonstration first before you use it
- dynacode a dynamic complexity-aware code benchmark for evaluating large language | arXiv: 2503.10452
- Dynamic and Generalizable Process Reward Modeling | arXiv: 2507.17849
- dynamic chunking and selection for reading comprehension of ultra-long context i | arXiv: 2506.00773
- dynamic evaluation with cognitive reasoning for multi-turn safety of large langu
- dynamic head selection for neural lexicalized constituency parsing
- dynamic knowledge integration for evidence-driven counter-argument generation wi | arXiv: 2503.05328
- dynamic label name refinement for few-shot dialogue intent classification | arXiv: 2412.15603
- Dynamic Order Template Prediction for Generative Aspect-Based Sentiment Analysis | arXiv: 2406.11130
- dynamic parallel tree search for efficient llm reasoning | arXiv: 2502.16235
- dynamic scaling of unit tests for code reward modeling | arXiv: 2501.01054
- EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models | arXiv: 2508.01625
- eagle expert-guided self-enhancement for preference alignment in pathology large
- ecerc evidence-cause attention network for multi-modal emotion recognition in co
- ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of Intent | arXiv: 2403.04481
- EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association | arXiv: 2505.15196
- edit once update everywhere a simple framework for cross-lingual knowledge synch | arXiv: 2502.14645
- EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models | arXiv: 2502.19765
- editinspector a benchmark for evaluation of text-guided image edits | arXiv: 2506.09988
- educationq evaluating llms teaching capabilities through multi-agent dialogue fr
- educators perceptions of large language models as tutors comparing human and ai | arXiv: 2506.08702
- Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection | arXiv: 2411.07446
- efficient domain continual pretraining by mitigating the stability gap
- efficient ensemble for fine-tuning language models on multiple datasets | arXiv: 2505.21930
- Efficient Knowledge Editing via Minimal Precomputation | arXiv: 2506.04226
- efficient long context language model retrieval with compression | arXiv: 2412.18232
- efficient many-shot in-context learning with dynamic block-sparse attention | arXiv: 2503.08640
- efficient opamp adaptation for zoom attention to golden contexts | arXiv: 2502.12502
- efficient pretraining data selection for language models via multi-actor collabo
- efficient safety alignment of large language models via preference re-ranking an
- Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization | arXiv: 2405.14189
- Efficiently Identifying Watermarked Segments in Mixed-Source Texts | arXiv: 2410.03600
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | arXiv: 2407.11062
- effivlm bench vlm acceleration
- EffiVLM-Bench: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models | arXiv: 2506.00479
- ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming | arXiv: 2505.16667
- elba-bench an efficient learning backdoor attacks benchmark for large language m | arXiv: 2502.18511
- eli-why evaluating the pedagogical utility of language model explanations | arXiv: 2506.14200
- embedding-converter a unified framework for cross-model embedding transformation
- Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents | arXiv: 2505.19997
- embracing large language models in traffic flow forecasting | arXiv: 2412.12201
- Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation | arXiv: 2506.00288
- emma-x an embodied multimodal action model with grounded chain of thought and lo
- empaths at semeval-2025 task 11 retrieval-augmented approach to perceived emotio | arXiv: 2506.04409
- empathy prediction from diverse perspectives
- employing discourse coherence enhancement to improve cross-document event and en
- emulate a multi-agent framework for determining the veracity of atomic claims by | arXiv: 2505.16576
- enabling chatbots with eyes and ears an immersive multimodal conversation system | arXiv: 2506.00421
- enabling llm knowledge analysis via extensive materialization | arXiv: 2411.04920
- end-to-end dialog neural coreference resolution balancing efficiency and accurac | arXiv: 2504.05824
- energy considerations of large language model inference and efficiency optimizat
- english-based acoustic models perform well in the forced alignment of two englis
- enhance multimodal consistency and coherence for text-image plan generation | arXiv: 2506.11380
- Enhancing Automated Interpretability with Output-Centric Feature Descriptions | arXiv: 2501.08319
- enhancing chain-of-thought reasoning with critical representation fine-tuning | arXiv: 2507.10085
- Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning | arXiv: 2411.17679
- enhancing conversational agents with theory of mind aligning beliefs desires and | arXiv: 2502.14171
- enhancing cross-lingual transfer through reversible transliteration a huffman-ba
- enhancing event-centric news cluster summarization via data sharpening and local
- enhancing goal-oriented proactive dialogue systems via consistency reflection an | arXiv: 2506.13366
- enhancing human evaluation in machine translation with comparative judgement
- Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion Knowledge | arXiv: 2506.15504
- enhancing input-label mapping in in-context learning with contrastive decoding | arXiv: 2502.13738
- Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models | arXiv: 2506.01334
- enhancing lexicon-based text embeddings with large language models | arXiv: 2501.09749
- enhancing llm agent safety via causal influence prompting | arXiv: 2507.00979
- enhancing machine translation with self-supervised preference data
- enhancing marker scoring accuracy through ordinal confidence modelling in educat | arXiv: 2505.23315
- enhancing mathematical reasoning in llms by stepwise correction | arXiv: 2410.12934
- enhancing medical dialogue generation through knowledge refinement and dynamic p | arXiv: 2506.10877
- Enhancing Multimodal Continual Instruction Tuning with BranchLoRA | arXiv: 2506.02041
- enhancing multimodal retrieval via complementary information extraction and alig
- enhancing ner by harnessing multiple datasets with conditional variational autoe
- enhancing neural machine translation through target language data a knn-lm appro
- Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub | arXiv: 2312.17294
- enhancing retrieval systems with inference-time logical reasoning | arXiv: 2503.17860
- enhancing retrieval-augmented generation via evidence tree search
- Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization | arXiv: 2507.10923
- enhancing spoken discourse modeling in language models using gestural cues | arXiv: 2503.03474
- enhancing text editing for grammatical error correction arabic as a case study | arXiv: 2503.00985
- enhancing the comprehensibility of text explanations via unsupervised concept di | arXiv: 2505.20293
- enhancing transformation from natural language to signal temporal logic using ll | arXiv: 2505.20658
- Enhancing Transformers for Generalizable First-Order Logical Entailment | arXiv: 2501.00759
- enhancing unsupervised sentence embeddings via knowledge-driven data augmentatio
- enigmatom improve llms theory-of-mind reasoning capabilities with neural knowled | arXiv: 2503.03340
- Enough Coin Flips Can Make LLMs Act Bayesian | arXiv: 2503.04722
- Ensemble Watermarks for Large Language Models | arXiv: 2411.19563
- enstom enhancing dialogue systems with entropy-scaled steering vectors for topic | arXiv: 2505.16526
- entailed between the lines incorporating implication into nli | arXiv: 2501.07719
- entailment-preserving first-order logic representations in natural language enta | arXiv: 2502.16757
- entity framing and role portrayal in the news | arXiv: 2502.14718
- entropy-based exploration conduction for multi-step reasoning | arXiv: 2503.15848
- entropy-uid a method for optimizing information density | arXiv: 2502.14366
- epicode boosting model performance beyond training with extrapolation and contra | arXiv: 2506.03489
- epman episodic memory attention for generalizing to longer contexts | arXiv: 2502.14280
- epo explicit policy optimization for strategic reasoning in llms via reinforceme
- error comparison optimization for large language models on aspect-based sentimen
- error-driven data-efficient large multimodal model tuning | arXiv: 2412.15652
- eru-kg efficient reference-aligned unsupervised keyphrase generation | arXiv: 2505.24219
- EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents | arXiv: 2412.13549
- Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis | arXiv: 2506.04142
- estimating privacy leakage of augmented contextual knowledge in language models | arXiv: 2410.03026
- eta-wavlm efficient speaker identity removal in self-supervised speech represent | arXiv: 2505.19273
- etf an entity tracing framework for hallucination detection in code summaries | arXiv: 2410.14748
- evaluating design decisions for dual encoder-based entity disambiguation | arXiv: 2505.11683
- evaluating implicit bias in large language models by attacking from a psychometr | arXiv: 2406.14023
- Evaluating Language Models as Synthetic Data Generators | arXiv: 2412.03679
- evaluating lexical proficiency in neural language models
- evaluating llms for portuguese sentence simplification with linguistic insights
- evaluating multimodal language models as visual assistants for visually impaired | arXiv: 2503.22610
- Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search | arXiv: 2506.11155
- evaluating personalized tool-augmented llms from the perspectives of personaliza
- evaluating sequence labeling on the basis of information theory
- evaluating the evaluation of diversity in commonsense generation | arXiv: 2506.00514
- evaluating theory of an uncertain mind predicting the uncertain beliefs of other
- evaluating visual and cultural interpretation the k-viscuit benchmark with human | arXiv: 2406.16469
- evaluation agent efficient and promptable evaluation framework for visual genera
- evaluation of attribution bias in generator-aware retrieval-augmented large lang | arXiv: 2410.12380
- Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation | arXiv: 2412.13666
- evaluation of llms in medical text summarization the role of vocabulary adaptati | arXiv: 2505.21242
- eventrag enhancing llm generation with event knowledge graphs
- evolvebench a comprehensive benchmark for assessing temporal awareness in llms o
- evowiki evaluating llms on evolving knowledge | arXiv: 2412.13582
- Ewe: Improving Factuality with Explicit Working Memory | arXiv: 2412.18069
- exclusion of thought mitigating cognitive load in large language models for enha
- execute a multilingual benchmark for llm token understanding | arXiv: 2505.17784
- exit context-aware extractive compression for enhancing retrieval-augmented gene | arXiv: 2412.12559
- expectation confirmation preference optimization for multi-turn conversational r | arXiv: 2506.14302
- expert an explainable image captioning evaluation metric with structured explana | arXiv: 2506.24016
- expetrans llms are experiential transfer learners | arXiv: 2505.23191
- explain-then-process using grammar prompting to enhance grammatical acceptabilit | arXiv: 2506.02302
- explaining matters leveraging definitions and semantic expansion for sexism dete | arXiv: 2506.06238
- explaining puzzle solutions in natural language an exploratory study on 6x6 sudo | arXiv: 2505.15993
- explica evaluating explicit causal reasoning in large language models | arXiv: 2502.15487
- explicit and implicit data augmentation for social event detection | arXiv: 2509.04202
- explicit vs implicit investigating social bias in large language models through | arXiv: 2501.02295
- exploiting contextual knowledge in llms through mathcalv-usable information base
- exploiting the shadows unveiling privacy leaks through lower-ranked tokens in la
- exploracoder advancing code generation for multiple unseen apis via planning and | arXiv: 2412.05366
- explorer scaling exploration-driven web trajectory synthesis for multimodal web | arXiv: 2502.11357
- exploring compositional generalization of multimodal llms for medical imaging | arXiv: 2412.20070
- exploring explanations improves the robustness of in-context learning | arXiv: 2506.02378
- exploring forgetting in large language model pre-training | arXiv: 2410.17018
- exploring gender bias in large language models an in-depth dive into the german | arXiv: 2507.16557
- exploring graph representations of logical forms for language modeling | arXiv: 2505.14523
- Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder | arXiv: 2411.05195
- exploring in-context example generation for machine translation | arXiv: 2506.00507
- exploring in-image machine translation with real-world background | arXiv: 2505.15282
- exploring llms ability to spontaneously and conditionally modify moral expressio
- exploring multimodal challenges in toxic chinese detection taxonomy benchmark an | arXiv: 2505.24341
- exploring multimodal relation extraction of hierarchical tabular data with multi
- Exploring Persona Sentiment Sensitivity in Personalized Dialogue Generation | arXiv: 2502.11423
- exploring the impact of instruction-tuning on llms susceptibility to misinformat | arXiv: 2507.18203
- exposing numeracy gaps a benchmark to evaluate fundamental numerical abilities i | arXiv: 2502.11075
- exposing the achilles heel evaluating llms ability to handle mistakes in mathema
- Extending Complex Logical Queries on Uncertain Knowledge Graphs | arXiv: 2403.01508
- Extending LLM Context Window with Adaptive Grouped Positional Encoding: A Training-Free Method
- f5-tts a fairytaler that fakes fluent and faithful speech with flow matching
- FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models | arXiv: 2502.17924
- factbench a dynamic benchmark for in-the-wild language model factuality evaluati
- factual knowledge in language models robustness and anomalies under simple tempo | arXiv: 2502.01220
- fairi tales evaluation of fairness in indian contexts with a focus on bias and s | arXiv: 2506.23111
- fairness beyond performance revealing reliability disparities across groups in l
- Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs | arXiv: 2502.01926
- fairsteer inference time debiasing for llms with dynamic activation steering | arXiv: 2504.14492
- faithful and robust llm-driven theorem proving for nli explanations | arXiv: 2505.24264
- FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation | arXiv: 2506.08938
- fast or slow integrating fast intuition and deliberate thinking for enhancing vi
- fast-and-frugal text-graph transformers are effective link predictors | arXiv: 2408.06778
- fastdraft how to train your draft | arXiv: 2411.11055
- faster speculative decoding via effective draft decoder with pruned candidate tr
- fastmcts a simple sampling strategy for data synthesis | arXiv: 2502.11476
- fcmr robust evaluation of financial cross-modal multi-hop reasoning | arXiv: 2412.12567
- FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation | arXiv: 2503.06680
- feat a preference feedback dataset through a cost-effective auto-generation and | arXiv: 2506.19325
- federated data-efficient instruction tuning for large language models | arXiv: 2410.10926
- FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Large Language Models | arXiv: 2410.09432
- fidelis faithful reasoning in large language model for knowledge graph question | arXiv: 2405.13873
- fiha autonomous hallucination evaluation in vision-language models with davidson | arXiv: 2409.13612
- filter-and-refine a mllm based cascade system for industrial-scale video content | arXiv: 2507.17204
- FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging | arXiv: 2506.05828
- Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for Chatbots | arXiv: 2501.03441
- finding needles in images can multi-modal llms locate fine details | arXiv: 2508.05053
- finding the sweet spot preference data construction for scaling preference optim | arXiv: 2502.16825
- fine-grained video dubbing duration alignment with segment supervised preference | arXiv: 2508.08550
- Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs | arXiv: 2407.03181
- finereason evaluating and improving llms deliberate reasoning through reflective | arXiv: 2502.20238
- finite state automata inside transformers with chain-of-thought a mechanistic st
- finmme benchmark dataset for financial multi-modal reasoning evaluation | arXiv: 2505.24714
- fitcf a framework for automatic feature importance-guided counterfactual example | arXiv: 2501.00777
- fixing distribution shifts of llm self-critique via on-policy self-play training
- flagevalmm a flexible framework for comprehensive multimodal model evaluation | arXiv: 2506.09081
- FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | arXiv: 2410.12266
- flashbackefficient retrieval-augmented language modeling for long context infere | arXiv: 2405.04065
- flexora flexible low-rank adaptation for large language models
- flexrag a flexible and comprehensive framework for retrieval-augmented generatio | arXiv: 2506.12494
- Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching | arXiv: 2507.05617
- floorplan-llama aligning architects feedback and domain knowledge in architectur
- focalpo enhancing preference optimizing by focusing on correct preference rankin | arXiv: 2501.06645
- focus evaluating pre-trained vision-language models on underspecification reason
- focus on what matters enhancing medical vision-language models with automatic at
- focused-dpo enhancing code generation through focused preference optimization on | arXiv: 2502.11475
- focusllm precise understanding of long context by dynamic condensing | arXiv: 2408.11745
- foldmoe efficient long sequence moe training via attention-moe pipelining
- follow-up question generation for enhanced patient-provider conversations | arXiv: 2503.17509
- foodtaxo generating food taxonomies with large language models | arXiv: 2505.19838
- forward knows efficient backward path saliency-guided memory-efficient fine-tuni
- FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | arXiv: 2502.14856
- fractal fine-grained scoring from aggregate text labels | arXiv: 2404.04817
- Frictional Agent Alignment Framework: Slow Down and Don't Break Things | arXiv: 2505.19428
- from ambiguity to accuracy the transformative effect of coreference resolution o | arXiv: 2507.07847
- from benign import toxic jailbreaking the language model via adversarial metapho
- from citations to criticality predicting legal decision influence in the multili
- from data to knowledge evaluating how efficiently language models learn facts | arXiv: 2506.16912
- from english to second language mastery enhancing llms with cross-lingual contin
- from human reading to nlm understanding evaluating the role of eye-tracking data
- from informal to formal -- incorporating and evaluating llms on natural language
- from information to insight leveraging llms for open aspect-based educational su
- from isolates to families using neural networks for automated language affiliati
- from lists to emojis how format bias affects model alignment | arXiv: 2409.11704
- from misleading queries to accurate answers a three-stage fine-tuning method for | arXiv: 2504.11277
- from neurons to semantics evaluating cross-linguistic alignment capabilities of
- from objectives to questions a planning-based framework for educational mathemat
- from outcomes to processes guiding prm learning from orm for inference-time alig
- from perceptions to decisions wildfire evacuation decision prediction with behav
- from real to synthetic synthesizing millions of diversified and complicated user | arXiv: 2506.03968
- from selection to generation a survey
- from selection to generation a survey of llm-based active learning | arXiv: 2502.11767
- from sub-ability diagnosis to human-aligned generation bridging the gap for text
- from teacher to student tracking memorization through model distillation | arXiv: 2506.16170
- from tools to teammates evaluating llms in multi-session coding interactions | arXiv: 2502.13791
- From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models | arXiv: 2505.09924
- fusing highly specialized language models for comprehensive expertise
- g-safeguard a topology-guided security lens and treatment on llm-based multi-age
- g2s a general-to-specific learning framework for temporal knowledge graph foreca | arXiv: 2506.00445
- ga-s3 comprehensive social network simulation with group agents | arXiv: 2506.03532
- GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis | arXiv: 2505.18710
- GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | arXiv: 2409.04183
- game development as human-llm interaction | arXiv: 2408.09386
- gamebot transparent assessment of llm reasoning in games | arXiv: 2412.13602
- GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization | arXiv: 2503.20194
- garage a benchmark with grounding annotations for rag evaluation | arXiv: 2506.07671
- gear generation augmented retrieval | arXiv: 2501.02772
- gear graph-enhanced agent for retrieval-augmented generation | arXiv: 2412.18431
- gec-metrics a unified library for grammatical error correction evaluation | arXiv: 2505.19388
- gellmtextthreesuperioro generalizing large language models for multi-property mo
- Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework | arXiv: 2506.15568
- genderalign an alignment dataset for mitigating gender bias in large language mo
- generalized attention flow feature attribution for transformer models via maximu
- generate first then sample enhancing fake news detection with llm-augmented rein
- generating diverse training samples for relation extraction with large language | arXiv: 2505.23108
- generating pedagogically meaningful visuals for math word problems a new benchma | arXiv: 2506.03735
- Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction | arXiv: 2501.13125
- generating synthetic relational tabular data via structural causal models | arXiv: 2507.03528
- generative frame sampler for long video understanding | arXiv: 2503.09146
- Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models | arXiv: 2502.02444
- generative reward modeling via synthetic criteria preference learning
- genetic instruct scaling up synthetic generation of coding instructions for larg | arXiv: 2407.21077
- genius a generalizable and purely unsupervised self-training framework for advan
- genknowsub improving modularity and reusability of llms through general knowledg | arXiv: 2505.10939
- genre a french gender-neutral rewriting system using collective nouns | arXiv: 2505.23630
- Geometric Signatures of Compositionality Across a Language Model's Lifetime | arXiv: 2410.01444
- getreason enhancing image context extraction through hierarchical multi-agent re
- gg-bbq german gender bias benchmark for question answering | arXiv: 2507.16410
- gift-sw gaussian noise injected fine-tuning of salient weights for llms | arXiv: 2408.15300
- GiFT: Gibbs Fine-Tuning for Code Generation | arXiv: 2502.11466
- gigachat family efficient russian language modeling through mixture of experts a | arXiv: 2506.09440
- GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages | arXiv: 2406.11546
- global eye breaking the fixed thinking pattern during the instruction expansion
- global mmlu understanding and addressing cultural and linguistic biases in multi
- godbench a benchmark for multimodal large language models in video comment art | arXiv: 2505.11436
- GORP: Continual Gradient Low-Rank Projection Fine-Tuning for LLMs | arXiv: 2507.02503
- gpt-4 as a homework tutor can improve student engagement and learning outcomes | arXiv: 2409.15981
- grace a granular benchmark for evaluating model calibration against human calibr | arXiv: 2502.19684
- Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models | arXiv: 2507.01915
- GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models | arXiv: 2507.04455
- graf graph retrieval augmented by facts for romanian legal multi-choice question | arXiv: 2412.04119
- GRAM: Generative Recommendation via Semantic-aware Multi-granular Late Fusion | arXiv: 2506.01673
- grammamt improving machine translation with grammar-informed in-context learning | arXiv: 2410.18702
- grampa subword regularisation by skewing uniform segmentation distributions with
- Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning | arXiv: 2506.03939
- Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs | arXiv: 2410.11001
- graph-guided cross-composition feature disentanglement for compositional zero-sh | arXiv: 2408.09786
- graph-structured trajectory extraction from travelogues | arXiv: 2410.16633
- graphcheck breaking long-term text barriers with extracted knowledge graph-power | arXiv: 2502.16514
- graphically speaking unmasking abuse in social media with conversation insights | arXiv: 2504.01902
- graphinsight unlocking insights in large language models for graph structure und
- GraphNarrator: Generating Textual Explanations for Graph Neural Networks | arXiv: 2410.15268
- grat guiding retrieval-augmented reasoning through process rewards tree search
- grounded or a good guesser a per-question balanced dataset to separate blind fro
- group then scale dynamic mixture-of-experts multilingual language model | arXiv: 2506.12388
- Growing Through Experience: Scaling Episodic Grounding in Language Models | arXiv: 2506.01312
- gsq-tuning group-shared exponents integer in fully quantized training for llms o | arXiv: 2502.12913
- GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning | arXiv: 2505.22661
- gui agents a survey | arXiv: 2412.13501
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent | arXiv: 2505.16827
- guicourse from general vision language model to versatile gui agent | arXiv: 2406.11317
- GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents | arXiv: 2505.11368
- guidelines for fine-grained sentence-level arabic readability annotation | arXiv: 2410.08674
- guiding not forcing enhancing the transferability of jailbreaking attacks on llm
- Gumbel Reranking: Differentiable End-to-End Reranker Optimization | arXiv: 2502.11116
- gödel agent a self-referential agent framework for recursive self-improvement | arXiv: 2410.04444
- haco-det a study towards fine-grained machine-generated text detection under hum
- haf-rm a hybrid alignment framework for reward model training | arXiv: 2407.04185
- haic improving human action understanding and generation with better captions fo
- Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training | arXiv: 2410.15460
- hallulens llm hallucination benchmark | arXiv: 2504.17550
- HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | arXiv: 2501.08292
- hanging in the balance pivotal moments in crisis counseling conversations | arXiv: 2506.03941
- hard negative mining for domain-specific retrieval in enterprise systems | arXiv: 2505.18366
- harnessing pdf data for improving japanese large multimodal models | arXiv: 2502.14778
- Has Machine Translation Evaluation Achieved Human Parity? | arXiv: 2506.19571
- hash-rag bridging deep hashing with retriever for efficient fine retrieval and a | arXiv: 2505.16133
- hata trainable and hardware-efficient hash-aware top-k attention for scalable la | arXiv: 2506.02572
- HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter | arXiv: 2411.15462
- have we designed generalizable structural knowledge promptings systematic evalua
- hd-ndes neural differential equations for hallucination detection in llms | arXiv: 2506.00088
- health-llm personalized retrieval-augmented disease prediction system | arXiv: 2402.00746
- helios harmonizing early fusion late fusion and llm reasoning for multi-granular | arXiv: 2603.02248
- hellaswag-pro a large-scale bilingual benchmark for evaluating the robustness of | arXiv: 2502.11393
- Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing Feedback | arXiv: 2507.16007
- helpsteer3 human-annotated feedback and edit data to empower inference-time scal | arXiv: 2503.04378
- hft half fine-tuning for large language models | arXiv: 2404.18466
- hiagent hierarchical working memory management for solving long-horizon agent ta
- HiCUPID: Exploring the Potential of LLMs as Personalized Assistants | arXiv: 2506.01262
- hidden in plain sight evaluation of the deception detection capabilities of llms | arXiv: 2506.09424
- hiddendetect detecting jailbreak attacks against multimodal large language model | arXiv: 2502.14744
- HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model | arXiv: 2503.12941
- hierarchical attention generates better proofs | arXiv: 2504.19188
- Hierarchical Bracketing Encodings for Dependency Parsing as Tagging | arXiv: 2505.11693
- hierarchical document refinement for long-context retrieval-augmented generation | arXiv: 2505.10413
- Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings | arXiv: 2506.00277
- Hierarchical Memory Organization for Wikipedia Generation | arXiv: 2506.23393
- hierarchical retrieval with evidence curation for open-domain financial question | arXiv: 2505.20368
- hierarchical safety realignment lightweight restoration of safety in pruned larg | arXiv: 2505.16104
- hierarchical-task-aware multi-modal mixture of incremental lora experts for embo | arXiv: 2506.04595
- hintsoftruth a multimodal checkworthiness detection dataset with real and synthe
- hoh a dynamic benchmark for evaluating the impact of outdated information on ret | arXiv: 2503.04800
- homebench evaluating llms in smart homes with valid and invalid instructions acr | arXiv: 2505.19628
- hope a novel positional encoding without long-term decay for enhanced context aw
- HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval | arXiv: 2506.07296
- how do llms acquire new knowledge a knowledge circuits perspective on continual | arXiv: 2502.11196
- How does Misinformation Affect Large Language Model Behaviors and Preferences? | arXiv: 2505.21608
- how does response length affect long-form factuality | arXiv: 2505.23295
- how far are llms from being our digital twins a benchmark for persona-based beha | arXiv: 2502.14642
- How Humans and LLMs Organize Conceptual Knowledge: Exploring Subordinate Categories in Italian | arXiv: 2505.21301
- how llms comprehend temporal meaning in narratives a case study in cognitive eva
- how much do encoder models know about word senses
- How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs | arXiv: 2410.13857
- how to compare things properly a study of argument relevance in comparative ques
- How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond | arXiv: 2501.05714
- how to mitigate overfitting in weak-to-strong generalization | arXiv: 2503.04249
- How to Train Long-Context Language Models (Effectively) | arXiv: 2410.02660
- hpss heuristic prompting strategy search for llm evaluators | arXiv: 2502.13031
- hscr hierarchical self-contrastive rewarding for aligning medical vision languag | arXiv: 2506.00805
- human alignment how much do we adapt to llms
- humt dumt measuring and controlling human-like language in llms | arXiv: 2502.13259
- HybGRAG: Hybrid Retrieval-Augmented Generation on Textual and Relational Knowledge Bases | arXiv: 2412.16311
- hybrid preferences learning to route instances for human vs ai feedback | arXiv: 2410.19133
- hygenar an llm-driven hybrid genetic algorithm for few-shot grammar generation | arXiv: 2505.16978
- hykge a hypothesis knowledge graph enhanced rag framework for accurate and relia
- hyperfm fact-centric multimodal fusion for link prediction over hyper-relational
- hypothetical documents or knowledge leakage rethinking llm-based query expansion | arXiv: 2504.14175
- i see what you mean co-speech gestures for reference resolution in multimodal di | arXiv: 2503.00071
- i0t embedding standardization method towards zero modality gap | arXiv: 2412.14384
- iagent llm agent as a shield between user and recommender systems | arXiv: 2502.14662
- iam efficient inference through attention mapping between different-scale llms | arXiv: 2507.11953
- icr probe tracking hidden state dynamics for reliable hallucination detection in | arXiv: 2507.16488
- idea enhancing the rule learning ability of large language model agent through i | arXiv: 2408.10455
- identifying cellular niches in spatial transcriptomics an investigation into the
- identifying open challenges in language identification
- Identifying Reliable Evaluation Metrics for Scientific Text Revision | arXiv: 2506.04772
- if attention serves as a cognitive
- if attention serves as a cognitive model of human memory retrieval what is the p | arXiv: 2502.11469
- if eleanor rigby had met chatgpt a study on loneliness in a post-llm world | arXiv: 2412.01617
- imol incomplete-modality-tolerant learning for multi-domain fake news video dete
- impara-ged grammatical error detection is boosting reference-free grammatical er | arXiv: 2506.02899
- impart importance-aware delta-sparsification for improved model compression and
- impartial multi-task representation learning via variance-invariant probabilisti
- implicit cross-lingual rewarding for efficient multilingual preference alignment | arXiv: 2503.04647
- implicit reasoning in transformers is reasoning through shortcuts | arXiv: 2503.07604
- ImpliHateVid: Implicit Hate Speech Detection in Videos | arXiv: 2508.06570
- improve language model and brain alignment via associative memory | arXiv: 2505.13844
- improve rule retrieval and reasoning with self-induction and relevance reestimat | arXiv: 2505.10870
- improve safety training of large language models with safety-critical singular v
- Improve Vision Language Model Chain-of-thought Reasoning | arXiv: 2410.16198
- Improved Unbiased Watermark for Large Language Models | arXiv: 2502.11268
- Improving Automatic Evaluation of LLMs in Biomedical Relation Extraction via LLMs-as-the-Judge | arXiv: 2506.00777
- improving chain-of-thought reasoning via quasi-symbolic abstractions | arXiv: 2502.12616
- improving contextual faithfulness of large language models via retrieval heads-i
- improving continual pre-training through seamless data packing | arXiv: 2505.22018
- improving dialogue discourse parsing through discourse-aware utterance clarifica
- improving dialogue state tracking through combinatorial search for in-context ex | arXiv: 2506.00622
- improving fairness of large language models in multi-document summarization | arXiv: 2506.07479
- Improving Language and Modality Transfer in Translation by Character-level Modeling | arXiv: 2505.24561
- improving low-resource morphological inflection via self-supervised objectives | arXiv: 2506.05227
- improving medical large vision-language models with abnormal-aware feedback | arXiv: 2501.01377
- improving mllms document image machine translation via synchronously self-review | arXiv: 2507.08309
- improving model factuality with fine-grained critique-based evaluator | arXiv: 2410.18359
- improving parallel sentence mining for low-resource and endangered languages
- improving preference extraction in llms by identifying latent knowledge through | arXiv: 2503.17755
- Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics | arXiv: 2506.00637
- in prospect and retrospect reflective memory management for long-term personaliz | arXiv: 2503.08026
- in the llm era word sense induction remains unsolved | arXiv: 2603.11686
- In-the-wild Audio Spatialization with Flexible Text-guided Localization | arXiv: 2506.00927
- incongruity-aware tension field network for multi-modal sarcasm detection
- inconsistent tokenizations cause language models to be perplexed by japanese gra
- incorporating domain knowledge into materials tokenization | arXiv: 2506.11115
- indicsynth a large-scale multilingual synthetic speech dataset for low-resource
- inducing lexicons of in-group language with socio-temporal context | arXiv: 2409.19257
- inductionbench llms fail in the simplest complexity class | arXiv: 2502.15823
- inews a multimodal dataset for modeling personalized affective responses to news | arXiv: 2503.03335
- Inference Compute-Optimal Video Vision Language Models | arXiv: 2505.18855
- inferring from logits exploring best practices for decoding-free generative cand
- inferring functionality of attention heads from their parameters | arXiv: 2412.11965
- infinisst simultaneous translation of unbounded speech with large language model | arXiv: 2503.02969
- influences on llm calibration a study of response agreement loss functions and p | arXiv: 2501.03991
- infogen generating complex statistical infographics from documents | arXiv: 2507.20046
- information extraction from visually rich documents using llm-based organization
- information locality as an inductive bias for neural language models | arXiv: 2506.05136
- injongo a multicultural intent detection and slot-filling dataset for 16 african | arXiv: 2502.09814
- inner thinking transformer leveraging dynamic depth scaling to foster adaptive i | arXiv: 2502.13842
- innovative image fraud detection with cross-sample anomaly analysis the power of
- InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training | arXiv: 2503.02769
- Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | arXiv: 2410.08145
- InspireDebate: Multi-Dimensional Evaluation-Guided Reasoning for Debating | arXiv: 2506.18102
- instance-selection-inspired undersampling strategies for bias reduction in small
- instruction tuning on public government and cultural data for low-resource langu
- instruction-tuning data synthesis from scratch via web reconstruction | arXiv: 2504.15573
- instructpart task-oriented part segmentation with instruction reasoning | arXiv: 2505.18291
- integrating audio visual and semantic information for enhanced multimodal speake
- inter-passage verification for multi-evidence multi-answer qa | arXiv: 2506.00425
- interact enabling interactive question-driven learning in large language models | arXiv: 2412.11388
- interactive and expressive code-augmented planning with large language models | arXiv: 2411.13826
- interactive evolution a neural-symbolic self-training framework for large langua
- interlocking-free selective rationalization through genetic-based learning | arXiv: 2412.10312
- internal and external impacts of natural language processing papers | arXiv: 2505.16061
- internal value alignment in large language models through controlled value vecto | arXiv: 2507.11316
- internlm-xcomposer25-reward a simple yet effective multi-modal reward model | arXiv: 2501.12368
- InterpoLL: Mitigating Shortcut Learning with InterpoLated Learning | arXiv: 2507.05527
- interpret and improve in-context learning via the lens of input-label mappings
- introducing graph context into language models through parameter-efficient fine-
- introducing verification task of set consistency with set-consistency energy net
- Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process | arXiv: 2405.11870
- investalign overcoming data scarcity in aligning large language models with inve
- investigating and enhancing the robustness of large multimodal models against te
- investigating and enhancing vision-audio capability in omnimodal large language | arXiv: 2503.00059
- investigating and extending homans social exchange theory with large language mo
- investigating context-faithfulness in large language models the roles of memory | arXiv: 2409.10955
- investigating language preference of multilingual rag systems | arXiv: 2502.11175
- investigating the robustness of retrieval-augmented generation at the query leve | arXiv: 2507.06956
- investorbench a benchmark for financial decision-making tasks with llm-based age
- IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization | arXiv: 2411.06208
- ipo your language model is secretly a preference classifier | arXiv: 2502.16182
- iquest an iterative question-guided framework for knowledge base question answer | arXiv: 2506.01784
- iris interactive research ideation system for accelerating scientific discovery | arXiv: 2504.16728
- iris interpretable retrieval-augmented classification for long interspersed docu
- IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery | arXiv: 2510.09217
- Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training | arXiv: 2502.12734
- IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory | arXiv: 2506.01048
- is it just semantics a case study of discourse particle understanding in llms | arXiv: 2506.04534
- is linguistically-motivated data augmentation worth it | arXiv: 2506.03593
- is llm an overconfident judge unveiling the capabilities of llms in detecting of | arXiv: 2502.06207
- Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering | arXiv: 2502.13962
- isr self-refining referring expressions for entity grounding
- its not a walk in the park challenges of idiom translation in speech-to-text sys | arXiv: 2506.02995
- its not bragging if you can back it up can llms understand braggings
- jailbreak large vision-language models through multi-modal linkage | arXiv: 2412.00473
- jailbreaking one step is enough | arXiv: 2412.12621
- JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs | arXiv: 2402.05668
- jarvis-vla post-training large-scale vision language models to play visual games | arXiv: 2503.16365
- jopa explaining large language models generation via joint prompt attribution | arXiv: 2405.20404
- jsontuning towards generalizable robust and controllable instruction tuning | arXiv: 2310.02953
- judging the judges can large vision-language models fairly evaluate chart compre | arXiv: 2505.08468
- just a scratch enhancing llm capabilities for self-harm detection through intent | arXiv: 2506.05073
- just go parallel improving the multilingual capabilities of large language model | arXiv: 2506.13044
- JuStRank: Benchmarking LLM Judges for System Ranking | arXiv: 2412.09569
- katfishnet detecting llm-generated korean text through linguistic feature analys | arXiv: 2503.00032
- kazmmlu evaluating language models on kazakh russian and regional knowledge of k | arXiv: 2502.12829
- kda automated data generation pipeline for detoxifying implicitly offensive lang | arXiv: 2506.13513
- kerl knowledge-enhanced personalized recipe recommendation using large language | arXiv: 2505.14629
- kg-agent an efficient autonomous agent framework for complex reasoning over know
- kirag knowledge-driven iterative retriever for enhancing retrieval-augmented gen
- kitab-bench a comprehensive multi-domain benchmark for arabic ocr and document u | arXiv: 2502.14949
- knockout llm assessment using large language models for evaluations through iter | arXiv: 2506.03785
- know you first and be you better modeling human-like user simulators via implici | arXiv: 2502.18968
- know your mistakes towards preventing overreliance on task-oriented conversation | arXiv: 2501.10316
- knowcoder-x boosting multilingual information extraction via code | arXiv: 2411.04794
- Knowledge Boundary of Large Language Models: A Survey | arXiv: 2412.12472
- knowledge decoupling via orthogonal projection for lifelong editing of large lan
- Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation | arXiv: 2501.02226
- knowledge image matters improving knowledge-based visual reasoning with multi-im
- knowledge tracing in programming education integrating students questions | arXiv: 2502.10408
- knowledge-augmented multimodal clinical rationale generation for disease diagnos
- KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education? | arXiv: 2412.08985
- kodcode a diverse challenging and verifiable synthetic dataset for coding | arXiv: 2503.02951
- KoGEM: Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean | arXiv: 2506.01237
- KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors | arXiv: 2506.01357
- kristeva close reading as a novel task for benchmarking interpretive reasoning | arXiv: 2505.09825
- KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding | arXiv: 2507.11273
- L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models | arXiv: 2402.04902
- La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America | arXiv: 2507.00999
- LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data Augmentation | arXiv: 2508.09515
- lacuna inc at semeval-2025 task 4 lora-enhanced influence-based unlearning for l | arXiv: 2506.04044
- ladder language-driven slice discovery and error rectification in vision classif | arXiv: 2408.07832
- LADM: Long-context Training Data Selection with Attention-based Dependency Measurement | arXiv: 2503.02502
- lamb a training-free method to enhance the long-context understanding of ssms vi
- langmark a multilingual dataset for automatic post-editing | arXiv: 2511.17153
- LangSAMP: Language-Script Aware Multilingual Pretraining | arXiv: 2409.18199
- language complexity measurement as a noisy zero-shot proxy for evaluating llm pe | arXiv: 2502.11578
- language constrained multimodal hyper adapter for many-to-many multimodal summar
- Language Fusion for Parameter-Efficient Cross-lingual Transfer (FLARE) | arXiv: 2501.06892
- language model fine-tuning on scaled survey data for predicting distributions of | arXiv: 2502.16761
- language model probabilities are not calibrated in numeric contexts | arXiv: 2410.16007
- language models can subtly deceive without lying a case study on strategic phras | arXiv: 2405.04325
- language models grow less humanlike beyond phase transition | arXiv: 2502.18802
- Language Models Resist Alignment: Evidence From Data Compression | arXiv: 2406.06144
- Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More | arXiv: 2503.10542
- Language-Codec: Bridging Discrete Codec Representations and Speech Language Models | arXiv: 2402.12208
- LAQuer: Localized Attribution Queries in Content-grounded Generation | arXiv: 2506.01187
- large language and protein assistant for protein-protein interactions prediction
- large language and reasoning models are shallow disjunctive reasoners | arXiv: 2503.23487
- large language models are good relational learners | arXiv: 2506.05725
- large language models for predictive analysis how far are they | arXiv: 2505.17149
- large language models in bioinformatics a survey | arXiv: 2503.04490
- large language models struggle to describe the haystack without human help a soc
- large margin representation learning for robust cross-lingual named entity recog
- large vocabulary size improves large language models | arXiv: 2406.16508
- latim measuring latent token-to-token interactions in mamba models | arXiv: 2502.15612
- LazyReview: A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews | arXiv: 2504.11042
- ldir low-dimensional dense and interpretable text embeddings with relative repre | arXiv: 2505.10354
- leancode understanding models better for code simplification of pre-trained larg | arXiv: 2505.14759
- learn to memorize scalable continual learning in semiparametric models with mixt
- learning auxiliary tasks improves reference-free hallucination detection in open
- learning first-order logic rules for argumentation mining
- learning from litigation graphs and llms for retrieval and reasoning in ediscove | arXiv: 2405.19164
- learning from negative samples in biomedical generative entity linking | arXiv: 2408.16493
- learning to align multi-faceted evaluation a unified and robust framework | arXiv: 2502.18874
- learning to generate structured output with schema reinforcement learning | arXiv: 2502.18878
- learning to look at the other side a semantic probing study of word embeddings i
- learning to reason from feedback at test-time | arXiv: 2502.15771
- Learning to Reason Over Time: Timeline Self-Reflection for Temporal Reasoning | arXiv: 2504.05258
- learning to rewrite generalized llm-generated text detection | arXiv: 2408.04237
- learning together to perform better teaching small-scale llms to collaborate via
- led-merging mitigating safety-utility conflicts in model merging with location-e
- legalagentbench evaluating llm agents in legal domain | arXiv: 2412.17259
- legalreasoner step-wised verification-correction for legal judgment reasoning | arXiv: 2506.07443
- lemonade a large multilingual expert-annotated abstractive event dataset for the | arXiv: 2506.00980
- length controlled generation for black-box llms | arXiv: 2412.14656
- length-induced embedding collapse in plm-based models | arXiv: 2410.24200
- lesa learnable llm layer scaling-up | arXiv: 2502.13794
- less for more enhanced feedback-aligned mixed llms for molecule caption generati
- less is more explainable and efficient icd code prediction with clinical entitie
- less mature is more adaptable for sentence-level language modeling
- Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts | arXiv: 2505.22582
- lets-c leveraging text embedding for time series classification | arXiv: 2407.06533
- Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | arXiv: 2502.11882
- leveraging human production-interpretation asymmetries to test llm cognitive pla | arXiv: 2503.17579
- leveraging in-context learning for political bias testing of llms | arXiv: 2506.22232
- leveraging large language models to measure gender representation bias in gender | arXiv: 2406.13677
- Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs | arXiv: 2506.05629
- leveraging unit language guidance to advance speech modeling in textless speech- | arXiv: 2505.15333
- leveraging variation theory in counterfactual data augmentation for optimized ac | arXiv: 2408.03819
- lexclipr cross-lingual paragraph retrieval from legal judgments
- lexgen domain-aware multilingual lexicon generation | arXiv: 2405.11200
- lexical diversity-aware relevance assessment for retrieval-augmented generation
- lexical recall or logical reasoning probing the limits of reasoning abilities in
- lexkeyplan planning with keyphrases and retrieval augmentation for legal text ge
- lextempus enhancing temporal generalizability of legal language models through d
- library-like behavior in language models is enhanced by self-referencing causal | arXiv: 2501.13491
- lifbench evaluating the instruction following performance and stability of large
- limited generalizability in argument mining state-of-the-art models learn datase | arXiv: 2505.22137
- limited-resource adapters are regularizers not linguists | arXiv: 2505.24525
- Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning | arXiv: 2502.17407
- literary evidence retrieval via long-context language models | arXiv: 2506.03090
- Literature Meets Data: A Synergistic Approach to Hypothesis Generation | arXiv: 2410.17309
- Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs | arXiv: 2505.09338
- llama-omni 2 llm-based real-time spoken chatbot with autoregressive streaming sp
- llamaduo llmops pipeline for seamless migration from service llms to small-scale | arXiv: 2408.13467
- llamas have feelings too unveiling sentiment and emotion representations in llam
- llase-g1 incentivizing generalization capability for llama-based speech enhancem
- llava steering visual instruction tuning with 500x fewer parameters through moda
- llm agents making agent tools | arXiv: 2502.11705
- LLM as a Broken Telephone: Iterative Generation Distorts Information | arXiv: 2502.20258
- llm as effective streaming processor bridging streaming-batch mismatches with gr | arXiv: 2505.16983
- llm as entity disambiguator for biomedical entity-linking
- LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates | arXiv: 2503.16334
- llm meets scene graph can large language models understand and generate scene gr | arXiv: 2505.19510
- llm-based rumor detection via influence guided sample selection and game-based p
- llm-enhanced self-evolving reinforcement learning for multi-step e-commerce paym | arXiv: 2509.18719
- llm-guided semantic-aware clustering for topic modeling
- LLM-Powered Test Case Generation for Detecting Bugs in Plausible Programs | arXiv: 2404.10304
- llms can achieve high-quality simultaneous machine translation as efficiently as | arXiv: 2504.09570
- llms can be easily confused by instructional distractions | arXiv: 2502.04362
- LLMs can Perform Multi-Dimensional Analytic Writing Assessments | arXiv: 2502.11368
- LLMs Can Simulate Standardized Patients via Agent Coevolution | arXiv: 2412.11716
- llms caught in the crossfire malware requests and jailbreak challenges | arXiv: 2506.10022
- LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks | arXiv: 2406.18403
- llms know their vulnerabilities uncover safety gaps through natural distribution | arXiv: 2410.10700
- llms persona-plug personalized llms | arXiv: 2409.11901
- llms syntactically adapt their language use to their conversational partner
- llms trust humans more thats a problem unveiling and mitigating the authority bi
- llmsrxllm25 less is more enhancing structured multi-agent reasoning via quality- | arXiv: 2504.16408
- llmtimesmapreduce simplified long-sequence processing using large language model
- locagent graph-guided llm agents for code localization | arXiv: 2503.09089
- local look-ahead guidance via verifier-in-the-loop for automated theorem proving | arXiv: 2503.09730
- localizing and mitigating errors in long-form question answering | arXiv: 2407.11930
- Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models | arXiv: 2507.18263
- logic-regularized verifier elicits reasoning from llms
- logical consistency is vital neural-symbolic information retrieval for negative- | arXiv: 2505.22299
- logical forms complement probability in understanding language model and human p | arXiv: 2502.09589
- LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning | arXiv: 2409.12929
- logicqa logical anomaly detection with vision language model generated questions | arXiv: 2503.20252
- LoGU: Long-form Generation with Uncertainty Expressions | arXiv: 2410.14309
- longbench v2 towards deeper understanding and reasoning on realistic long-contex | arXiv: 2412.15204
- LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating | arXiv: 2412.18424
- longdpo unlock better long-form generation abilities for llms via critique-augme | arXiv: 2502.02095
- longrecipe recipe for efficient long context generalization in large language mo
- longred mitigating short-text degradation of long-context large language models | arXiv: 2502.07365
- longreward improving long-context large language models with ai feedback | arXiv: 2410.21252
- longsafety evaluating long-context safety of large language models | arXiv: 2502.16971
- look both ways and no sink converting llms into text encoders without training
- lost in literalism how supervised training shapes translationese in llms | arXiv: 2503.04369
- lost in multilinguality dissecting cross-lingual factual inconsistency in transf | arXiv: 2504.04264
- lost in the context insufficient and distracted attention to contexts in prefere
- lotus a leaderboard for detailed image captioning from quality to societal bias | arXiv: 2507.19362
- low-bit quantization favors undertrained llms
- low-perplexity llm-generated sequences and where to find them | arXiv: 2507.01844
- low-rank interconnected adaptation across layers | arXiv: 2407.09946
- lpoi listwise preference optimization for vision language models | arXiv: 2505.21061
- lr2bench evaluating long-chain reflective reasoning capabilities of large langua | arXiv: 2502.17848
- LSSF: Safety Alignment via Low-Rank Safety Subspace Fusion | arXiv: 2602.00038
- m-mad multidimensional multi-agent debate for advanced machine translation evalu | arXiv: 2412.20127
- M-RewardBench: Evaluating Reward Models in Multilingual Settings | arXiv: 2410.15522
- m2rc-eval massively multilingual repository-level code completion evaluation | arXiv: 2410.21157
- M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs | arXiv: 2503.04856
- m3finmeeting a multilingual multi-sector and multi-task financial meeting unders | arXiv: 2506.02510
- m3hg multimodal multi-scale and multi-type node heterogeneous graph for emotion | arXiv: 2508.18740
- machine translation models are zero-shot detectors of translation direction | arXiv: 2401.06769
- macp minimal yet mighty adaptation via hierarchical cosine projection | arXiv: 2410.09103
- madakv adaptive modality-perception kv cache eviction for efficient multimodal l | arXiv: 2506.15724
- magic-vqa multimodal and grounded inference with commonsense knowledge for visua | arXiv: 2503.18491
- magnet augmenting generative decoders with representation learning and infilling | arXiv: 2501.08648
- magnet multi-turn tool-use data synthesis and distillation via graph translation | arXiv: 2503.07826
- main-rag multi-agent filtering retrieval-augmented generation | arXiv: 2501.00332
- make imagination clearer stable diffusion-based visual imagination for multimoda
- making fetch happen finding emergent dog whistles through common habitats | arXiv: 2412.12072
- making llms better many-to-many speech-to-text translators with curriculum learn | arXiv: 2409.19510
- mam modular multi-agent framework for multi-modal medical diagnosis via role-spe | arXiv: 2506.19835
- mamba knockout for unraveling factual information flow | arXiv: 2505.24244
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | arXiv: 2412.05237
- Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | arXiv: 2410.09403
- maple enhancing review generation with multi-aspect prompt learning in explainab
- mapmake schema guided text to table generation | arXiv: 2505.23174
- mapnav a novel memory representation via annotated semantic maps for vlm-based v
- maporl multi-agent post-co-training for collaborative large language models with | arXiv: 2502.18439
- Mapping 1,000+ Language Models via the Log-Likelihood Vector | arXiv: 2502.16173
- mapping the podcast ecosystem with the structured podcast research corpus | arXiv: 2411.07892
- mapqator an extensible framework for efficient annotation of map-based qa datase | arXiv: 2412.21015
- MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment | arXiv: 2503.01711
- Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models | arXiv: 2507.11882
- marco-o1 v2 towards widening the distillation bottleneck for reasoning models | arXiv: 2503.01461
- mars benchmarking the metaphysical reasoning abilities of language models with a | arXiv: 2406.02106
- masking in multi-hop qa an analysis of how language models perform with context | arXiv: 2505.11754
- masks can be learned as an alternative to experts
- masrouter learning to route llms for multi-agent systems | arXiv: 2502.11133
- Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes | arXiv: 2410.16930
- mathcala3 automatic alignment framework for attributed text generation
- mathcoder-vl bridging vision and code for enhanced multimodal mathematical reaso | arXiv: 2505.10557
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion | arXiv: 2503.16212
- maxife multilingual and cross-lingual instruction following evaluation | arXiv: 2506.01776
- maximal matching matters preventing representation collapse for robust cross-mod | arXiv: 2506.21538
- maximizing the effectiveness of larger bert models for compression
- mcbe a multi-task chinese bias evaluation benchmark for large language models | arXiv: 2507.02088
- mcs-bench a comprehensive benchmark for evaluating multimodal large language mod
- mdbench a synthetic multi-document reasoning benchmark generated with knowledge | arXiv: 2506.14927
- mdcure a scalable pipeline for multi-document instruction-following | arXiv: 2410.23463
- mdit-bench evaluating the dual-implicit toxicity in large multimodal models | arXiv: 2505.17144
- meaning beyond truth conditions evaluating discourse level understanding via ana
- meaning variation and data quality in the corpus of founding era american englis
- measuring data diversity for instruction tuning a systematic analysis and a reli | arXiv: 2502.17184
- measuring social biases in masked language models by proxy of prediction quality | arXiv: 2402.13954
- measuring the effect of transcription noise on downstream language understanding | arXiv: 2502.13645
- mechanistic interpretability of emotion inference in large language models | arXiv: 2502.05489
- medbiorag semantic search and retrieval-augmented generation with large language | arXiv: 2512.10996
- meddxagent a unified modular agent framework for explainable automatic different | arXiv: 2502.19175
- medical graph rag evidence-based medical large language model via graph retrieva
- megapairs massive data synthesis for universal multimodal retrieval | arXiv: 2412.14475
- megen generative backdoor into large language models via model editing | arXiv: 2408.10722
- meit multimodal electrocardiogram instruction tuning on large language models fo | arXiv: 2403.04945
- membench towards more comprehensive evaluation on the memory of llm-based agents | arXiv: 2506.21605
- memeqa holistic evaluation for meme understanding
- memerag a multilingual end-to-end meta-evaluation benchmark for retrieval augmen | arXiv: 2502.17163
- memorization a close look at books | arXiv: 2504.12549
- Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation | arXiv: 2502.01491
- memorizing is not enough deep knowledge injection through reasoning | arXiv: 2504.00472
- MEraser: An Effective Fingerprint Erasure Approach for Large Language Models | arXiv: 2506.12551
- merge hijacking backdoor attacks to model merging of large language models | arXiv: 2505.23561
- MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models | arXiv: 2410.08604
- meta-learning neural mechanisms rather than bayesian priors | arXiv: 2503.16048
- Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models | arXiv: 2504.14194
- meta-reflection a feedback-free reflection learning framework | arXiv: 2412.13781
- meta-tool unleash open-world function calling capabilities of general-purpose la
- metal a multi-agent framework for chart generation with test-time scaling | arXiv: 2502.17651
- metasynth meta-prompting-driven agentic scaffolds for diverse synthetic data gen | arXiv: 2504.12563
- mexma token-level objectives improve sentence representations | arXiv: 2409.12737
- MHA2MLA: Towards Economical Inference by Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs | arXiv: 2502.14837
- Micro-Act: Mitigate Knowledge Conflict in QA via Actionable Self-Reasoning | arXiv: 2506.05278
- Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs | arXiv: 2502.14830
- milic-eval benchmarking multilingual llms for chinas minority languages | arXiv: 2503.01150
- mimicking the familiar dynamic command generation for information theft attacks
- mind a multi-agent framework for zero-shot harmful meme detection | arXiv: 2507.06908
- mind the belief gap group identity in the world of llms | arXiv: 2503.02016
- mind the gap static and interactive evaluations of large audio models | arXiv: 2502.15919
- mind the gesture evaluating ai sensitivity to culturally offensive non-verbal ge
- mind your tone investigating how prompt politeness affects llm accuracy short pa | arXiv: 2510.04950
- MindRef: Mimicking Human Memory for Hierarchical Reference Retrieval with Fine-Grained Location Awareness | arXiv: 2402.17010
- minilongbench the low-cost long context understanding benchmark for large langua
- minimal pair-based evaluation of code-switching | arXiv: 2506.01840
- mining complex patterns of argumentative reasoning in natural language dialogue
- mining the uncertainty patterns of humans and models in the annotation of moral
- mir methodology inspiration retrieval for scientific research problems | arXiv: 2506.00249
- mira empowering one-touch ai services on smartphones with mllm-based instruction | arXiv: 2509.13773
- mirage exploring how large language models perform in complex social interactive | arXiv: 2501.01652
- mire enhancing multimodal queries representation via fusion-free modality intera | arXiv: 2411.08334
- mis-prompt benchmarking large language models for proactive error handling | arXiv: 2506.00064
- misp-meeting a real-world dataset with multimodal cues for long-form meeting tra
- mitigate position bias in large language models via scaling a single dimension | arXiv: 2406.02536
- mitigating confounding in speech-based dementia detection through weight masking | arXiv: 2506.05610
- mitigating lost-in-retrieval problems in retrieval augmented multi-hop question | arXiv: 2502.14245
- mitigating negative interference in multilingual sequential knowledge editing th | arXiv: 2506.10800
- mitigating non-representative prototypes and representation bias in few-shot con
- mitigating posterior salience attenuation in long-context llms with positional c | arXiv: 2506.08371
- Mitigating Selection Bias with Node Pruning and Auxiliary Options | arXiv: 2409.18857
- Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | arXiv: 2503.13360
- mixture of decoding an attention-inspired adaptive decoding strategy to mitigate | arXiv: 2505.17061
- mixture of insightful experts mote the synergy of reasoning chains and expert mi
- mixture of ordered scoring experts for cross-prompt essay trait scoring
- mixture of small and large models for chinese spelling check | arXiv: 2506.06887
- mixtures of in-context learners | arXiv: 2411.02830
- mlas-lora language-aware parameters detection and lora-based knowledge transfer
- mldebugging towards benchmarking code debugging across multi-library scenarios | arXiv: 2506.13824
- mm-verify enhancing multimodal reasoning with chain-of-thought verification | arXiv: 2502.13383
- mmboundary advancing mllm knowledge boundary awareness through reasoning step co | arXiv: 2505.23224
- MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration | arXiv: 2505.23224
- mmdend dendrite-inspired multi-branch multi-compartment parallel spiking neuron
- mmina benchmarking multihop multimodal internet agents | arXiv: 2404.09992
- mmlu-cf a contamination-free multi-task language understanding benchmark | arXiv: 2412.15194
- mmmu pro robust benchmark
- MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark | arXiv: 2409.02813
- mmrc a large-scale benchmark for understanding multimodal large language model i
- mms-llama efficient llm-based audio-visual speech recognition with minimal multi | arXiv: 2503.11315
- MMSafeAware: Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs | arXiv: 2502.11184
- mmscibench benchmarking language models on chinese multimodal scientific problem | arXiv: 2503.01891
- mmunlearner reformulating multimodal machine unlearning in the era of multimodal | arXiv: 2502.11051
- mobilora accelerating lora-based llm inference on mobile devices via context-awa
- moc mixtures of text chunking learners for retrieval-augmented generation system | arXiv: 2503.09600
- mockconf a student interpretation dataset analysis word- and span-level alignmen | arXiv: 2506.04848
- Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models | arXiv: 2502.15910
- Model Extrapolation Expedites Alignment | arXiv: 2404.16792
- model performance-guided evaluation data selection for effective prompt optimiza | arXiv: 2505.10736
- modeling complex semantics relation with contrastively fine-tuned relational enc
- modeling the evolution of english noun compounds with feature-rich diachronic co
- modeling uncertainty in composed image retrieval via probabilistic embeddings
- Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment | arXiv: 2407.14878
- molrag unlocking the power of large language models for molecular property predi
- monitoring decoding mitigating hallucination via evaluating the factuality of pa | arXiv: 2503.03106
- MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | arXiv: 2506.07533
- more a mixture of low-rank experts for adaptive multi-task learning | arXiv: 2505.22694
- more is not always better enhancing many-shot in-context learning with different
- Morpher: Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? | arXiv: 2412.08174
- MorphMark: Flexible Adaptive Watermarking for Large Language Models | arXiv: 2505.11541
- mosaic multiple observers spotting ai content | arXiv: 2409.07615
- moscar a large-scale multilingual and multimodal document-level corpus | arXiv: 2406.08707
- movie101v2 improved movie narration benchmark | arXiv: 2404.13370
- mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | arXiv: 2409.03420
- MPO: Multilingual Safety Alignment via Reward Gap Optimization | arXiv: 2505.16869
- mpvstance mitigating hallucinations in stance detection with multi-perspective v
- mrakl multilingual retrieval-augmented knowledge graph construction for low-reso | arXiv: 2507.16011
- mt-raig novel benchmark and evaluation framework for retrieval-augmented insight | arXiv: 2502.11735
- mtextthreesuperiorgqa a multi-entity multi-hop multi-setting graph question answ
- mtsa multi-turn safety alignment for llms through multi-round red-teaming | arXiv: 2505.17147
- mtvqa benchmarking multilingual text-centric visual question answering | arXiv: 2405.11985
- multi-agent collaboration via cross-team orchestration | arXiv: 2406.08979
- Multi-Attribute Steering of Language Models via Targeted Intervention | arXiv: 2502.12446
- Multi-document Summarization through Event Relation Graph Reasoning for Framing Bias Mitigation | arXiv: 2506.12978
- multi-facet blending for faceted query-by-example retrieval | arXiv: 2412.01443
- multi-hop question generation via dual-perspective keyword guidance | arXiv: 2505.15299
- multi-hop reasoning for question answering with hyperbolic representations | arXiv: 2507.03612
- multi-level association refinement network for dialogue aspect-based sentiment q
- Multi-Level Explanations for Generative Language Models | arXiv: 2403.14459
- multi-level relevance document identifier learning for generative retrieval
- multi-modality expansion and retention for llms through parameter merging and de
- multi-perspective alignment for increasing naturalness in neural machine transla | arXiv: 2412.08473
- multi-prompting decoder helps better language understanding | arXiv: 2406.06279
- multi-task adversarial attacks against black-box model with few-shot queries | arXiv: 2508.10039
- multiagentbench evaluating the collaboration and competition of llm agents | arXiv: 2503.01935
- multilingual arbitration optimizing data pools to accelerate multilingual progre
- multilingual encoder knows more than you realize shared weights pretraining for | arXiv: 2502.10852
- multilingual gloss-free sign language translation towards building a sign langua
- multilingual retrieval augmented generation for culturally-sensitive tasks a ben | arXiv: 2410.01171
- multilingual text-to-image generation magnifies gender stereotypes
- multimed multilingual medical speech recognition via attention encoder decoder | arXiv: 2409.14074
- MultiMM: Cultural Bias Matters — Cross-Cultural Benchmark for Multimodal Metaphors | arXiv: 2506.06987
- multimodal coreference resolution for chinese social media dialogues dataset and | arXiv: 2504.14321
- multimodal pragmatic jailbreak on text-to-image models | arXiv: 2409.19149
- multimodal transformers are hierarchical modal-wise heterogeneous graphs | arXiv: 2505.01068
- Multiple LLM Agents Debate for Equitable Cultural Alignment | arXiv: 2505.24671
- MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts | arXiv: 2406.12549
- musc improving complex instruction following with multi-granularity self-contras
- musts multilingual semantic textual similarity benchmark
- Mutual-Taught for Co-adapting Policy and Reward Models | arXiv: 2506.06292
- my life is miserable have to sign 500 autographs everyday exposing humblebraggin | arXiv: 2412.20057
- my words imply your opinion reader agent-based propagation enhancement for perso
- nametag 3 a tool and a service for multilingualmultitagset ner | arXiv: 2506.05949
- narrative media framing in political discourse | arXiv: 2506.00737
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention | arXiv: 2502.11089
- natural language processing in support of evidence-based medicine a scoping revi | arXiv: 2505.22280
- navigating rifts in human-llm grounding study and benchmark | arXiv: 2503.13975
- negative matters multi-granularity hard-negative synthesis and anchor-token-awar
- negvqa can vision language models understand negation | arXiv: 2505.22946
- neko cross-modality post-recognition error correction with tasks-guided mixture- | arXiv: 2411.05945
- Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset | arXiv: 2412.02595
- neural incompatibility the unbridgeable gap of cross-scale parametric knowledge
- neural parameter search for slimmer fine-tuned models and better transfer | arXiv: 2505.18713
- neural topic modeling with large language models in the loop | arXiv: 2411.08534
- neuron empirical gradient discovering and quantifying neurons global linear cont | arXiv: 2412.18053
- neuron-level sequential editing for large language models | arXiv: 2410.04045
- NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering | arXiv: 2505.19754
- newsinterview a dataset and a playground to evaluate llms grounding gap via info | arXiv: 2411.13779
- NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization | arXiv: 2505.24575
- ngqa a nutritional graph question answering benchmark for personalized health-aw
- no questions are stupid but some are poorly posed understanding poorly-posed inf
- noreval a norwegian language understanding and generation evaluation benchmark | arXiv: 2504.07749
- Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability | arXiv: 2408.08137
- not all terms matter recall-oriented adaptive learning for plm-aided query expan
- not quite sherlock holmes language model predictions do not reliably differentia | arXiv: 2506.06808
- Nudging: Inference-time Alignment of LLMs via Guided Decoding | arXiv: 2410.09300
- nusaaksara a multimodal and multilingual benchmark for preserving indonesian ind
- nvagent automated data visualization from natural language via collaborative age
- oasis order-augmented strategy for improved code search | arXiv: 2503.08161
- obfuslm privacy-preserving language model service against embedding inversion at
- Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation | arXiv: 2503.08057
- olmotrace tracing language model outputs back to trillions of training tokens | arXiv: 2504.07096
- omgm orchestrate multiple granularities and modalities for efficient multimodal | arXiv: 2505.07879
- omnialign-v towards enhanced alignment of mllms with human preference | arXiv: 2502.18411
- omnicharacter towards immersive role-playing agents with seamless speech-languag
- omniflatten an end-to-end gpt model for seamless voice conversation | arXiv: 2410.17799
- on entity identification in language models | arXiv: 2506.02701
- on generalization across measurement systems llms entail more test-time compute | arXiv: 2506.02591
- on many-shot in-context learning for long-context evaluation | arXiv: 2411.07130
- on support samples of next word prediction | arXiv: 2506.04047
- on synthesizing data for context attribution in question answering | arXiv: 2504.05317
- on synthetic data strategies for domain-specific generative retrieval | arXiv: 2502.17957
- on the acquisition of shared grammatical representations in bilingual language m | arXiv: 2503.03962
- On the Limit of Language Models as Planning Formalizers | arXiv: 2412.09879
- on the mutual influence of gender and occupation in llm representations | arXiv: 2503.06792
- on the relation between fine-tuning topological properties and task performance
- On the Reliability of Large Language Models for Causal Discovery | arXiv: 2407.19638
- on the risk of evidence pollution for malicious social text detection in the era | arXiv: 2410.12600
- on the robust approximation of asr metrics | arXiv: 2502.12408
- on-policy self-alignment with fine-grained knowledge feedback for hallucination | arXiv: 2406.12221
- one for all update parameterized knowledge across multiple models with once edit | arXiv: 2506.00817
- one missing piece for open-source reasoning models a dataset to mitigate cold-st | arXiv: 2506.02338
- one quantllm for all fine-tuning quantized llms once for efficient deployments | arXiv: 2405.20202
- one size fits none rethinking fairness in medical ai | arXiv: 2506.14400
- onebench to test them all sample-level benchmarking over open-ended capabilities | arXiv: 2412.06745
- Online Iterative Self-Alignment for Radiology Report Generation | arXiv: 2505.11983
- Only a Little to the Left: A Theory-grounded Measure of Political Bias in LLMs | arXiv: 2503.16148
- ontology-guided reverse thinking makes large language models stronger on knowled
- open-set living need prediction with large language models | arXiv: 2506.02713
- open-world attribute mining for e-commerce products with multimodal self-correct
- open-world planning via lifted regression with llm-inferred affordances for embo
- opencoder the open cookbook for top-tier code large language models | arXiv: 2411.04905
- openwebvoyager building multimodal web agents via iterative real-world explorati
- opt-out investigating entity-level unlearning for large language models via opti | arXiv: 2406.12329
- Optimal Transport-Based Token Weighting for Enhanced Preference Optimization | arXiv: 2505.18720
- optimized text embedding models and benchmarks for amharic passage retrieval | arXiv: 2505.19356
- optimizing decomposition for optimal claim verification | arXiv: 2503.15354
- optimizing pre-training data mixtures with mixtures of data expert models | arXiv: 2502.15950
- optimizing question semantic space for dynamic retrieval-augmented multi-hop que
- os agents a survey on mllm-based agents for general computing devices use | arXiv: 2508.04482
- OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
- OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | arXiv: 2412.19723
- os-kairos adaptive interaction for mllm-powered gui agents | arXiv: 2503.16465
- outlier-safe pre-training for robust 4-bit quantization of large language models | arXiv: 2506.19697
- ozspeech one-step zero-shot speech synthesis with learned-prior-conditioned flow | arXiv: 2505.12800
- p2 law scaling law for post-training after model pruning
- p3 prompts promote prompting | arXiv: 2507.15675
- palm a culturally inclusive and linguistically diverse dataset for arabic llms | arXiv: 2503.00151
- Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models | arXiv: 2408.13533
- pap2pat benchmarking outline-guided long-text patent generation with patent-pape | arXiv: 2410.07009
- papersplease a benchmark for evaluating motivational values of large language mo | arXiv: 2506.21961
- parameter-aware contrastive knowledge editing tracing and rectifying based on cr
- parameter-efficient fine-tuning via circular convolution | arXiv: 2407.19342
- Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning | arXiv: 2410.10360
- parme parallel corpora for low-resourced middle eastern languages
- partial colexifications improve concept embeddings | arXiv: 2502.09743
- pasa an llm agent for comprehensive academic paper search | arXiv: 2501.10120
- past meets present creating historical analogy with large language models | arXiv: 2409.14820
- patch psychometrics-assisted benchmarking of large language models against human | arXiv: 2404.01799
- pattern recognition or medical knowledge the problem with multiple-choice questi | arXiv: 2406.02394
- pcot persuasion-augmented chain of thought for detecting fake news and social me | arXiv: 2506.06842
- PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation | arXiv: 2506.06842
- People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text | arXiv: 2501.15654
- performance gap in entity knowledge extraction across modalities in vision langu | arXiv: 2412.14133
- persistent homology of topic networks for the prediction of reader curiosity | arXiv: 2506.11095
- persona dynamics unveiling the impact of persona traits on agents in text-based | arXiv: 2504.06868
- personabench evaluating ai models on understanding personal information through | arXiv: 2502.20616
- personal travel solver a preference-driven llm-solver system for travel planning
- personalens a benchmark for personalization evaluation in conversational ai assi | arXiv: 2506.09902
- Personality-Guided Code Generation Using Large Language Models | arXiv: 2411.00006
- personalized generation in large model era a survey | arXiv: 2503.02614
- personalized text generation with contrastive activation steering | arXiv: 2503.05213
- perspective transition of large language models for solving subjective tasks | arXiv: 2501.09265
- persphere a comprehensive framework for multi-faceted perspective retrieval and | arXiv: 2412.12588
- phi-decoding adaptive foresight sampling for balanced inference-time exploration
- phonotomizer a compact unsupervised online training approach to real-time multil
- physreason a comprehensive benchmark towards physics-based reasoning | arXiv: 2502.12054
- pic unlocking long-form text generation capabilities of large language models vi
- PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative Prompts | arXiv: 2505.09921
- piguard prompt injection guardrail via mitigating overdefense for free
- piper benchmarking and prompting event reasoning boundary of llms via debiasing-
- pitfalls of scale investigating the inverse task of redefinition in large langua | arXiv: 2502.12821
- pixel-level reasoning segmentation via multi-turn conversations | arXiv: 2502.09447
- pkag-ddi pairwise knowledge-augmented language model for drug-drug interaction e
- pku-saferlhf towards multi-level safety alignment for llms with human preference | arXiv: 2406.15513
- PlanGenLLMs: A Modern Survey of LLM Planning Capabilities | arXiv: 2502.11221
- Planning with Diffusion Models for Target-Oriented Dialogue Systems | arXiv: 2504.16858
- planning-driven programming a large language model programming workflow | arXiv: 2411.14503
- planningarena a modular benchmark for multidimensional evaluation of planning an
- play2prompt zero-shot tool instruction optimization for llm agents via tool play | arXiv: 2503.14432
- Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models | arXiv: 2506.07424
- polynarrative a multilingual multilabel multi-domain dataset for narrative extra
- popalign diversifying contrasting patterns for a more comprehensive alignment | arXiv: 2410.13785
- position-aware automatic circuit discovery | arXiv: 2502.04577
- positional overload positional debiasing and context window extension for large
- powerformer efficient and high-accuracy privacy-preserving language model with h
- ppt a minor language news recommendation model via cross-lingual preference patt
- pqr improving dense retrieval via potential query modeling
- praetor a fine-grained generative llm evaluator with instance-level customizable
- Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges | arXiv: 2502.12378
- praise enhancing product descriptions with llm-driven structured insights | arXiv: 2506.17314
- pre-training curriculum for multi-token prediction in language models | arXiv: 2505.22757
- pre-training distillation for large language models a design space exploration | arXiv: 2410.16215
- predicate-conditional conformalized answer sets for knowledge graph embeddings | arXiv: 2505.16877
- Predicting Implicit Arguments in Procedural Video Instructions | arXiv: 2505.21068
- predicting through generation why generation is better for prediction | arXiv: 2502.17817
- predicting turn-taking and backchannel in human-machine conversations using ling | arXiv: 2505.12654
- prediction hubs are context-informed frequent tokens in llms | arXiv: 2502.10201
- prep-ocr a complete pipeline for document image restoration and enhanced ocr acc | arXiv: 2505.20429
- pretraining context compressor for large language models with embedding-based me
- preventing rogue agents improves multi-agent collaboration | arXiv: 2502.05986
- Pre³: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation | arXiv: 2506.03887
- Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries | arXiv: 2505.21859
- Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks | arXiv: 2407.17963
- PRISM: A Framework for Producing Interpretable Political Bias Embeddings | arXiv: 2505.24646
- PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance | arXiv: 2502.17041
- privacyrestore privacy-preserving inference in large language models via privacy
- private memorization editing turning memorization into a defense to strengthen d | arXiv: 2506.10024
- prmbench a fine-grained and challenging benchmark for process-level reward model | arXiv: 2501.03124
- probabilistic aggregation and targeted embedding optimization for collective mor | arXiv: 2506.14625
- probability-consistent preference optimization for enhanced llm reasoning | arXiv: 2505.23540
- probing llms for multilingual discourse generalization through a unified label s | arXiv: 2503.10515
- probing relative interaction and dynamic calibration in multi-modal entity align
- probing subphonemes in morphology models | arXiv: 2505.11297
- probing the geometry of truth consistency and generalization of truth directions | arXiv: 2506.00823
- problem-solving logic guided curriculum in-context learning for llms complex rea | arXiv: 2502.15401
- proceedings of the 63rd annual meeting of the association for computational ling
- processbench identifying process errors in mathematical reasoning | arXiv: 2412.06559
- progco program helps self-correction of large language models | arXiv: 2501.01264
- program synthesis benchmark for visual programming in xlogoonline environment | arXiv: 2406.11334
- programming by example meets historical linguistics a large language model based
- progressive multimodal reasoning via active retrieval | arXiv: 2412.14835
- promalex progressive modular adapters for multi-jurisdictional legal language mo
- Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation | arXiv: 2506.03857
- prompt-based personality profiling reinforcement learning for relevance filterin | arXiv: 2409.04122
- prompt-guided internal states for hallucination detection of large language mode
- proper a progressive learning framework for personalized large language models w
- protolens advancing prototype learning for fine-grained interpretability in text
- provbench a benchmark of legal provision recommendation for contract auto-review
- ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering | arXiv: 2507.00828
- proxy-driven robust multimodal sentiment analysis with incomplete data
- psyadvisor a plug-and-play strategy advice planner with proactive questioning in
- psycholinguistic word features a new approach for the evaluation of llms alignme | arXiv: 2506.22439
- psydial a large-scale long-term conversational dataset for mental health support
- psydt using llms to construct the digital twin of psychological counselor with p
- PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models | arXiv: 2502.13179
- PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension | arXiv: 2412.11906
- PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings | arXiv: 2506.00481
- pwngpt automatic exploit generation based on large language models
- q2e query-to-event decomposition for zero-shot multilingual text-to-video retrie | arXiv: 2506.10202
- QAEncoder: Towards Aligned Representation Learning in Question Answering Systems | arXiv: 2409.20434
- qaeval mixture of evaluators for question-answering task evaluation
- qdtsynth quality-driven formal theorem synthesis for enhancing proving performan
- qg-sms enhancing test item analysis via student modeling and simulation | arXiv: 2503.05888
- qqsum a novel task and model of quantitative query-focused summarization for rev | arXiv: 2506.04020
- Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis | arXiv: 2505.14742
- qualispeech a speech quality assessment dataset with natural language reasoning | arXiv: 2503.20290
- quantification of large language model distillation | arXiv: 2501.12619
- quantifying lexical semantic shift via unbalanced optimal transport | arXiv: 2412.12569
- quantifying misattribution unfairness in authorship attribution | arXiv: 2506.02321
- quantifying semantic emergence in language models | arXiv: 2405.12617
- quantized can still be calibrated a unified framework to calibration in quantize
- quasar a question-driven structure-aware approach for table-to-text generation
- Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies | arXiv: 2505.06186
- queryattack jailbreaking aligned large language models using structured non-natu | arXiv: 2502.09723
- qwen25-xcoder multi-agent collaboration for multilingual code instruction tuning
- r-fairness assessing fairness of ranking in subjective data
- R-VC: Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching | arXiv: 2506.01014
- r-vlm region-aware vision language model for precise gui grounding | arXiv: 2507.05673
- r2-multiomnia leading multilingual multimodal reasoning via self-training
- R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory | arXiv: 2501.12485
- RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection | arXiv: 2505.14318
- raemollm retrieval augmented llms for cross-domain misinformation detection usin | arXiv: 2406.11093
- rag-critic leveraging automated critic-guided agentic workflow for retrieval aug
- rageval scenario specific rag evaluation dataset generation framework | arXiv: 2408.01262
- rank chunk and expand lineage-oriented reasoning for taxonomy expansion | arXiv: 2505.13282
- rankcot refining knowledge for retrieval-augmented generation through ranking ch
- ranked voting based self-consistency of large language models | arXiv: 2505.10772
- ranking unraveled recipes for llm rankings in head-to-head ai combat | arXiv: 2411.14483
- RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models | arXiv: 2412.02830
- rate-nav region-aware termination enhancement for zero-shot object navigation wi | arXiv: 2506.02354
- rationales are not silver bullets measuring the impact of rationales on model pe | arXiv: 2505.24147
- rationalyst pre-training process-supervision for improving reasoning
- raven robust advertisement video violation temporal grounding via reinforcement | arXiv: 2510.16455
- Re-identification of De-identified Documents with Autoregressive Infilling | arXiv: 2505.12859
- Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms | arXiv: 2501.13977
- re-task revisiting llm tasks from capability skill and knowledge perspectives | arXiv: 2408.06904
- re3syn a dependency-based data synthesis framework for long-context post-trainin
- Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books | arXiv: 2506.01796
- readoc a unified benchmark for realistic document structured extraction | arXiv: 2409.05137
- real-mm-rag a real-world multi-modal retrieval benchmark | arXiv: 2502.12342
- real-time factuality assessment from adversarial feedback | arXiv: 2410.14651
- realhitbench a comprehensive realistic hierarchical table benchmark for evaluati | arXiv: 2506.13405
- reason from future reverse thought chain enhances llm reasoning | arXiv: 2506.03673
- reasoning circuits in language models a mechanistic interpretation of syllogisti | arXiv: 2408.08590
- reasoning is all you need for video generalization a counterfactual benchmark wi | arXiv: 2503.10691
- recent advances in speech language models a survey | arXiv: 2410.03751
- reclm recommendation instruction tuning | arXiv: 2412.19302
- reconsidering llm uncertainty estimation methods in the wild | arXiv: 2506.01114
- Recurrent Knowledge Identification and Fusion for Language Model Continual Learning | arXiv: 2502.17510
- recursive question understanding for complex question answering over heterogeneo | arXiv: 2505.11900
- red queen safeguarding large language models against concealed multi-turn jailbr | arXiv: 2409.17458
- red-teaming llm multi-agent systems via communication attacks | arXiv: 2502.14847
- redactor an llm-powered framework for automatic clinical data de-identification | arXiv: 2505.18380
- redundancy isotropy and intrinsic dimensionality of prompt-based text embeddings | arXiv: 2506.01435
- redundancy principles for mllms benchmarks | arXiv: 2501.13953
- redundancylens revealing and exploiting visual token processing redundancy for e | arXiv: 2501.19036
- reefknot a comprehensive benchmark for relation hallucination evaluation analysi | arXiv: 2408.09429
- ref-long benchmarking the long-context referencing capability of long-context la | arXiv: 2507.09506
- refind at semeval-2025 task 3 retrieval-augmented factuality hallucination detec | arXiv: 2502.13622
- refining salience-aware sparse fine-tuning strategies for language models | arXiv: 2412.13488
- ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework | arXiv: 2409.10289
- reflectioncoder learning from reflection sequence for enhanced one-off code gene | arXiv: 2405.17057
- ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | arXiv: 2410.17657
- refreshkv updating small kv cache during long-form generation | arXiv: 2411.05787
- Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | arXiv: 2407.09121
- registering source tokens to target language spaces in multilingual neural machi | arXiv: 2501.02979
- reinforced ir a self-boosting framework for domain-adapted information retrieval
- relationalcoder rethinking complex tables via programmatic relational transforma
- relearn unlearning via learning for large language models | arXiv: 2502.11190
- reliably bounding false positives a zero-shot machine-generated text detection f
- removal of hallucination on hallucination debate-augmented rag | arXiv: 2505.18581
- REP: Keys to Robust Edits — From Theoretical Insights to Practical Advances | arXiv: 2410.09338
- repanda pandas-powered tabular verification and reasoning | arXiv: 2503.11921
- Representation Bending for Large Language Model Safety | arXiv: 2504.01550
- representations of fact fiction and forecast in large language models epistemics | arXiv: 2506.01512
- repro-bench can agentic ai systems assess the reproducibility of social science | arXiv: 2507.18901
- reranking-based generation for unbiased perspective summarization | arXiv: 2506.15925
- ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision | arXiv: 2505.21250
- research borderlands analysing writing across research cultures | arXiv: 2506.00784
- response wide shut surprising observations in basic vision language model capabi | arXiv: 2507.10442
- rethinking evaluation metrics for grammatical error correction why use a differe | arXiv: 2502.09416
- rethinking kenlm good and bad model ensembles for efficient text quality filteri
- rethinking repetition problems of llms in code generation | arXiv: 2505.10402
- rethinking reward model evaluation through the lens of reward overoptimization | arXiv: 2505.12763
- rethinking semantic parsing for large language models enhancing llm performance | arXiv: 2409.14469
- rethinking table instruction tuning | arXiv: 2501.14693
- rethinking the role of prompting strategies in llm test-time scaling a perspecti | arXiv: 2505.10981
- retrieval models arent tool-savvy benchmarking tool retrieval for large language | arXiv: 2503.01763
- retrieval visual contrastive decoding to mitigate object hallucinations in large | arXiv: 2505.20569
- retrieval-augmented fine-tuning with preference optimization for visual program | arXiv: 2502.16529
- Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification | arXiv: 2402.04068
- retrofitting large language models with dynamic tokenization | arXiv: 2411.18553
- retrollm empowering large language models to retrieve fine-grained evidence with | arXiv: 2412.11919
- retrospective learning from interactions | arXiv: 2410.13852
- revealing the deceptiveness of knowledge editing a mechanistic analysis of super | arXiv: 2505.12636
- Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up | arXiv: 2410.12323
- reverse preference optimization for complex instruction following | arXiv: 2505.22172
- revisit self-debugging with self-generated tests for code generation | arXiv: 2501.12793
- revisiting 3d llm benchmarks are we really testing 3d capabilities | arXiv: 2502.08503
- revisiting classical chinese event extraction with ancient literature informatio
- Revisiting Common Assumptions about Arabic Dialects in NLP | arXiv: 2505.21816
- Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability | arXiv: 2506.15629
- revisiting epistemic markers in confidence estimation can markers accurately ref
- revisiting llms as zero-shot time series forecasters small noise can break large | arXiv: 2506.00457
- revisiting lora through the lens of parameter redundancy spectral encoding helps | arXiv: 2506.16787
- revisiting scaling laws for language models the role of data quality and trainin
- revisiting self-consistency from dynamic distributional alignment perspective on | arXiv: 2502.19830
- revisiting the test-time scaling of o1-like models do they truly possess test-ti | arXiv: 2502.12215
- revisiting uncertainty quantification evaluation in language models spurious int | arXiv: 2504.13677
- Revisiting Weak-to-Strong Generalization: Reverse KL vs. Forward KL | arXiv: 2502.11107
- reviving cultural heritage a novel approach for comprehensive historical documen
- revs unlearning sensitive information in language models via rank editing in the | arXiv: 2406.09325
- Reward Generalization in RLHF: A Topological Perspective | arXiv: 2402.10184
- rewrite to jailbreak discover learnable and transferable implicit harmfulness in | arXiv: 2502.11084
- right answer wrong score uncovering the inconsistencies of llm evaluation in mul | arXiv: 2503.14996
- riot efficient prompt refinement with residual optimization tree | arXiv: 2506.16389
- rise reasoning enhancement via iterative self-exploration in multi-hop question | arXiv: 2505.21940
- RISE: Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing | arXiv: 2410.06638
- rmoa optimizing mixture-of-agents through diversity maximization and residual co | arXiv: 2505.24442
- robust and minimally invasive watermarking for eaas | arXiv: 2410.17552
- robust data watermarking in language models by injecting fictitious knowledge | arXiv: 2503.04036
- robust estimation of population-level effects in repeated-measures nlp experimen
- robust preference optimization via dynamic target margins | arXiv: 2506.03690
- robust utility-preserving text anonymization based on large language models | arXiv: 2407.11770
- rocoft efficient finetuning of large language models with row-column updates | arXiv: 2410.10075
- roleplot a systematic framework for evaluating and enhancing the plot-progressio
- Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context | arXiv: 2410.16069
- root defense strategies ensuring safety of llm at the decoding level
- rotor towards more reliable responses for order-invariant inputs | arXiv: 2502.08662
- rpo retrieval preference optimization for robust retrieval-augmented generation | arXiv: 2501.13726
- rsatexttwosuperior a rhetorical-strategy-aware rational speech act framework for | arXiv: 2506.09301
- RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph | arXiv: 2505.20813
- rsvp reasoning segmentation via visual prompting and multi-modal chain-of-though | arXiv: 2506.04277
- rubriks cube testing a new rubric for evaluating explanations on the cube datase | arXiv: 2503.23899
- ruby an effective framework for multi-constraint multi-hop question generation
- RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios | arXiv: 2412.08972
- s-rag a novel audit framework for detecting unauthorized use of personal data in
- s2r teaching llms to self-verify and self-correct via reinforcement learning
- s2wtm spherical sliced-wasserstein autoencoder for topic modeling | arXiv: 2507.12451
- s3 - semantic signal separation
- Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification | arXiv: 2506.04592
- safer or luckier llms as safety evaluators are not robust to artifacts | arXiv: 2503.09347
- saferag benchmarking security in retrieval-augmented generation of large languag | arXiv: 2501.18636
- saferoute adaptive model selection for efficient and accurate safety guardrails | arXiv: 2502.12464
- safety alignment via constrained knowledge unlearning | arXiv: 2505.18588
- safety is not only about refusal reasoning-enhanced fine-tuning for interpretabl | arXiv: 2503.05021
- sake steering activations for knowledge editing | arXiv: 2503.01751
- salience sparse fine tuning
- sam decoding speculative decoding via suffix automaton | arXiv: 2411.10666
- sample-efficient human evaluation of large language models via maximum discrepan | arXiv: 2404.08008
- Sandcastles in the Storm: Revisiting Watermarking Impossibility | arXiv: 2505.06827
- sanskriti a comprehensive benchmark for evaluating language models knowledge of | arXiv: 2506.15355
- sara salience-aware reinforced adaptive decoding for large language models in ab
- scalable vision language model training via high quality data curation | arXiv: 2501.05952
- scale towards collaborative content analysis in social science with large langua
- ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting | arXiv: 2406.19976
- ScaleQuest: Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch | arXiv: 2410.18693
- scaling context not parameters training a compact 7b language model for efficien | arXiv: 2505.08651
- scaling laws and efficient inference for ternary language models | arXiv: 2506.23025
- Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | arXiv: 2502.14846
- scaling up the state size of rnn llms for long-context scenarios
- scanez integrating cognitive models with self-supervised learning for spatiotemp
- SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning | arXiv: 2406.10882
- scedit script-based assessment of knowledge editing | arXiv: 2505.23291
- scenegenagent precise industrial scene generation with coding agent | arXiv: 2410.21909
- sci-lora mixture of scientific loras for cross-domain lay paraphrasing | arXiv: 2505.18867
- sciver evaluating foundation models for multimodal scientific claim verification | arXiv: 2506.15569
- sconu selective conformal uncertainty in large language models | arXiv: 2504.14154
- scop evaluating the comprehension process of large language models from a cognit | arXiv: 2506.05000
- scope optimizing key-value cache compression in long-context generation | arXiv: 2412.13649
- sculpt systematic tuning of long prompts | arXiv: 2410.20788
- sdbench a survey-based domain-specific llm benchmarking and optimization framewo
- sdd self-degraded defense against malicious fine-tuning | arXiv: 2507.21182
- sdpo segment-level direct preference optimization for social agents | arXiv: 2501.01821
- SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | arXiv: 2502.12562
- seakr self-aware knowledge retrieval for adaptive retrieval augmented generation | arXiv: 2406.19215
- seal scaling to emphasize attention for long-context retrieval | arXiv: 2501.15225
- second language arabic acquisition of llms via progressive vocabulary expansion | arXiv: 2412.12310
- secret semi-supervised clinical trial document similarity search | arXiv: 2505.10780
- SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization | arXiv: 2402.11347
- seedbench a multi-task benchmark for evaluating large language models in seed sc | arXiv: 2505.13220
- seeking rational demonstrations for large language models a domain generalizatio
- segment first or comprehend first explore the limit of unsupervised word segment
- segment-based attention masking for gpts | arXiv: 2412.18487
- Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models | arXiv: 2412.11333
- select read and write a multi-agent framework of full-text-based related work ge | arXiv: 2505.19647
- selecting and merging towards adaptable and scalable named entity recognition wi
- self-correction is more than refinement a learning framework for visual and lang | arXiv: 2410.04055
- self-critique guided iterative reasoning for multi-hop question answering | arXiv: 2505.19112
- self-error-instruct generalizing from errors for llms mathematical reasoning | arXiv: 2505.22591
- self-foveate enhancing diversity and difficulty of synthesized instructions from | arXiv: 2507.23440
- self-instructed derived prompt generation meets in-context learning unlocking ne | arXiv: 2409.01552
- SELF-PERCEPT: Introspection Improves LLMs' Detection of Multi-Person Mental Manipulation in Conversations | arXiv: 2505.20679
- self-supervised quantized representation for seamlessly integrating knowledge gr
- Self-Taught Agentic Long-Context Understanding | arXiv: 2502.15920
- self-training elicits concise reasoning in large language models | arXiv: 2502.20122
- self-tuning instructing llms to effectively acquire new knowledge through self-t | arXiv: 2406.06326
- SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence | arXiv: 2502.08767
- semantic aware linear transfer by recycling pre-trained language models for cros | arXiv: 2505.10945
- Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models | arXiv: 2501.05752
- semantic outlier removal with embedding models and llms | arXiv: 2506.16644
- semantic-eval a semantic comprehension evaluation framework for large language m
- semeval-2025 task 1 admire -- advancing multimodal idiomaticity representation | arXiv: 2503.15358
- sentiment reasoning for healthcare | arXiv: 2407.21054
- SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection | arXiv: 2503.03303
- separating tongue from thought activation patching reveals language-agnostic con
- seqpo-simt sequential policy optimization for simultaneous machine translation | arXiv: 2505.20622
- serial lifelong editing via mixture of knowledge experts
- seuf is unlearning one expert enough for mixture-of-experts llms | arXiv: 2411.18797
- sgic a self-guided iterative calibration framework for rag | arXiv: 2506.16172
- shaping the safety boundaries understanding and defending against jailbreaks in
- share an slm-based hierarchical action correction assistant for text-to-sql | arXiv: 2506.00391
- share shared memory-aware open-domain long-term dialogue dataset constructed fro
- share text to sql correction
- Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding
- sheeps skin wolfs deeds are llms ready for metaphorical implicit hate speech
- ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework | arXiv: 2410.19453
- Shifting from Ranking to Set Selection for Retrieval Augmented Generation | arXiv: 2507.06838
- should i believe in what medical ai says a chinese benchmark for medication base
- shubert self-supervised sign language representation learning via multi-stream c | arXiv: 2411.16765
- sift-50m a large-scale multilingual dataset for speech instruction fine-tuning | arXiv: 2504.09081
- sightation counts leveraging sighted user feedback in building a blv-aligned dat
- silencing empowerment allowing bigotry auditing the moderation of hate speech on | arXiv: 2506.07667
- simgrag leveraging similar subgraphs for knowledge graphs driven retrieval-augme | arXiv: 2412.15272
- simuls2s-llm unlocking simultaneous inference of speech llms for speech-to-speec
- sincon mitigate llm-generated malicious message injection attack for rumor detec
- singakids a multilingual multimodal dialogic tutor for language learning | arXiv: 2506.02412
- single- vs dual-prompt dialogue generation with llms for job interviews in human | arXiv: 2502.18650
- single-to-mix modality alignment with multimodal large language model for docume | arXiv: 2507.07572
- sinhala encoder-only language models and evaluation
- skillaggregation reference-free llm-dependent aggregation | arXiv: 2410.10215
- SkillVerse: Assessing and Enhancing LLMs with Tree Evaluation | arXiv: 2506.00319
- sklep a slovak general language understanding benchmark | arXiv: 2506.21508
- slamming training a speech language model on one gpu in a day | arXiv: 2502.15814
- sleepless nights sugary days creating synthetic users with health conditions for | arXiv: 2502.13135
- Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models | arXiv: 2412.14574
- small changes big impact how manipulating a few neurons can drastically alter ll
- smart self-aware agent for tool overuse mitigation | arXiv: 2502.11435
- smarter better faster longer a modern bidirectional encoder for fast memory effi | arXiv: 2412.13663
- socialcc interactive evaluation for cultural competence in language agents
- socialeval evaluating social intelligence of large language models | arXiv: 2506.00900
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | arXiv: 2502.12134
- somethings fishy in the data lake a critical re-evaluation of table union search | arXiv: 2505.21329
- SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition | arXiv: 2402.17645
- sorft issue resolving with subtask-oriented reinforced fine-tuning | arXiv: 2502.20127
- sotopia-ensuremathomega dynamic strategy injection learning and social instructi | arXiv: 2502.15538
- soundwave less is more for speech-text alignment in llms | arXiv: 2502.12900
- spare enhancing spatial reasoning in vision-language models with synthetic data | arXiv: 2504.20648
- spark-tts an efficient llm-based text-to-speech model with single-stream decoupl | arXiv: 2503.01710
- sparse latents steer retrieval-augmented generation
- sparse logit sampling accelerating knowledge distillation in llms | arXiv: 2503.16870
- sparse rewards can self-train dialogue agents | arXiv: 2409.04617
- sparse-to-dense a free lunch for lossless acceleration of video understanding in | arXiv: 2505.19155
- Sparsify: Learning Sparsity for Effective and Efficient Music Performance Question Answering | arXiv: 2506.01319
- Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues | arXiv: 2506.00958
- spectra faster large language model inference with optimized internal and extern
- SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods | arXiv: 2507.21463
- SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models | arXiv: 2507.19361
- speechweave diverse multilingual synthetic text audio data generation pipeline f | arXiv: 2509.14270
- speed up your code progressive code acceleration through bidirectional tree edit
- SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation | arXiv: 2412.12693
- SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers | arXiv: 2507.06517
- splintering nonconcatenative languages for better tokenization | arXiv: 2503.14433
- spot bridging natural language and geospatial search for investigative journalis | arXiv: 2506.13188
- spotting out-of-character behavior atomic-level evaluation of persona fidelity i | arXiv: 2506.19352
- spurious correlations and beyond understanding and mitigating shortcut learning
- sql injection jailbreak a structural disaster of large language models | arXiv: 2411.01565
- sqlong enhanced nl2sql for longer contexts with llms | arXiv: 2502.16747
- squeezed attention accelerating long context length llm inference | arXiv: 2411.09688
- sr-llm rethinking the structured representation in large language model | arXiv: 2502.14352
- star-sql self-taught reasoner for text-to-sql | arXiv: 2502.13550
- state toxicn a benchmark for span-level target-aware toxicity extraction in chin | arXiv: 2501.15451
- State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models | arXiv: 2503.03499
- statement-tuning enables efficient cross-lingual generalization in encoder-only | arXiv: 2506.01592
- Statistical Deficiency for Task Inclusion Estimation | arXiv: 2503.05491
- stealing training data from large language models in decentralized training thro
- steering into new embedding spaces analyzing cross-lingual alignment induced by
- steering off course reliability challenges in steering language models | arXiv: 2504.04635
- stem-pom evaluating language models math-symbol reasoning in document parsing | arXiv: 2411.00387
- stepwise reasoning disruption attack of llms | arXiv: 2412.11934
- Stepwise Reasoning Disruption Attack of LLMs | arXiv: 2412.11934
- sticking to the mean detecting sticky tokens in text embedding models | arXiv: 2507.18171
- stitchllm serving llms one block at a time
- stochastic chameleons irrelevant context hallucinations reveal class-based misge | arXiv: 2505.22630
- stress-testing machine generated text detection shifting language models writing | arXiv: 2505.24523
- stricta structured reasoning in critical text assessment for peer review and bey | arXiv: 2409.05367
- STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond | arXiv: 2409.05367
- StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text | arXiv: 2406.10621
- structflowbench a structured flow benchmark for multi-turn instruction following | arXiv: 2502.14494
- structural reasoning improves molecular understanding of llm | arXiv: 2410.05610
- structure-aware domain knowledge injection for large language models | arXiv: 2407.16724
- STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | arXiv: 2409.06211
- sublime subset selection via rank correlation prediction for data-efficient llm
- Substance over Style: Evaluating Proactive Conversational Coaching Agents | arXiv: 2503.19328
- subword models struggle with word learning but surprisal hides it | arXiv: 2502.12835
- sudo rm -rf agentic security | arXiv: 2503.20279
- SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment | arXiv: 2410.14676
- surveyforge on the outline heuristics memory-driven generation and multi-dimensi
- surveypilot an agentic framework for automated human opinion collection from soc
- swiltra-bench the swiss legal translation benchmark | arXiv: 2503.01372
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images | arXiv: 2502.13928
- synapticrag enhancing temporal memory retrieval in large language models through | arXiv: 2410.13553
- synergistic weak-strong collaboration by aligning preferences | arXiv: 2504.15188
- Synergizing LLMs with Global Label Propagation for Multimodal Fake News Detection | arXiv: 2506.00488
- synergizing unsupervised episode detection with llms for large-scale news events | arXiv: 2408.04873
- syngraph a dynamic graph-llm synthesis framework for sparse streaming user senti | arXiv: 2503.04619
- SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs | arXiv: 2506.05598
- synthesizing post-training data for llms through multi-agent simulation | arXiv: 2410.14251
- synthia novel concept design with affordance composition | arXiv: 2502.17793
- SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | arXiv: 2504.03561
- systematic generalization in language models scales with information entropy | arXiv: 2505.13089
- t-reg preference optimization with token-level reward regularization | arXiv: 2412.02685
- T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback | arXiv: 2505.10561
- t2i-factualbench benchmarking the factuality of text-to-image models with knowle
- t5score a methodology for automatically assessing the quality of llm generated m | arXiv: 2407.17390
- table understanding and multimodal llms a cross-domain case study on scientific | arXiv: 2507.00152
- Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | arXiv: 2502.11799
- tabledreamer progressive and weakness-guided data synthesis from scratch for tab | arXiv: 2506.08646
- TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models | arXiv: 2503.04396
- tabxeval why this is a bad table an exhaustive rubric for table evaluation | arXiv: 2505.22176
- taclr a scalable and efficient retrieval-based method for industrial product att | arXiv: 2501.03835
- tada training-free recipe for decoding with adaptive kv cache compression and me | arXiv: 2506.04642
- tag-evol achieving efficient instruction evolving via tag injection | arXiv: 2505.24165
- tagrouter learning route to llms through tags for open-domain text generation ta | arXiv: 2506.12473
- takin-vc expressive zero-shot voice conversion via adaptive hybrid content encod
- taming language models for text-attributed graph learning with decoupled aggrega
- taming llms with gradient grouping
- targa targeted synthetic data generation for practical reasoning over structured | arXiv: 2412.19544
- targeted syntactic evaluation for grammatical error correction
- task-informed anti-curriculum by masking improves downstream performance on text | arXiv: 2502.12953
- task-specific information decomposition for end-to-end dense video captioning
- taxoadapt aligning llm-based multidimensional taxonomy construction to evolving | arXiv: 2506.10737
- taz2024full analysing german newspapers for gender bias and discrimination acros | arXiv: 2506.05388
- tc--rag turing--complete rags case study on medical llm systems
- tcsinger 2 customizable multilingual zero-shot singing voice synthesis | arXiv: 2505.14910
- teach a contrastive knowledge adaptive distillation framework for classical chin
- Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences | arXiv: 2506.00419
- teaching text agents to learn sequential decision making from failure
- Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions | arXiv: 2507.13773
- team ack at semeval-2025 task 2 beyond word-for-word machine translation for eng | arXiv: 2504.20451
- team anotheroption at semeval-2025 task 8 bridging the gap between open-source a | arXiv: 2506.09657
- teamlora boosting low-rank adaptation with expert collaboration and competition | arXiv: 2408.09856
- tell dont show leveraging language models abstractive retellings to model litera | arXiv: 2505.23166
- tempest autonomous multi-turn jailbreaking of large language models with tree se | arXiv: 2503.10619
- temporal reasoning for timeline summarisation in social media | arXiv: 2501.00152
- temporal relation extraction in clinical texts a span-based graph transformer ap
- terdy temporal relation dynamics through frequency decomposition for temporal kn
- tess 2 a large-scale generalist diffusion language model | arXiv: 2502.13917
- testnuc enhancing test-time computing approaches and scaling through neighboring | arXiv: 2502.19163
- tetris optimal draft token selection for batch speculative decoding | arXiv: 2502.15197
- texpert a multi-level benchmark for evaluating latex code generation by llms | arXiv: 2506.16990
- text is all you need llm-enhanced incremental social event detection
- text-to-es bench a comprehensive benchmark for converting natural language to el
- textitl-citeeval a suite for evaluating fidelity of long-context models
- that doesnt sound right evaluating speech transcription quality in field linguis
- that is unacceptable the moral foundations of canceling
- the ai gap how socioeconomic status affects language technology interactions | arXiv: 2505.12158
- the alternative annotator test for llm-as-a-judge how to statistically justify r
- the anatomy of evidence an investigation into explainable icd coding | arXiv: 2507.01802
- the behavior gap evaluating zero-shot llm agents in complex task-oriented dialog | arXiv: 2506.12266
- the cross-linguistic role of animacy in grammar structures
- the distracting effect understanding irrelevant passages in rag | arXiv: 2505.06914
- the efficiency vs accuracy trade-off optimizing rag-enhanced llm recommender sys
- the esethu framework reimagining sustainable dataset governance and curation for | arXiv: 2502.15916
- the essence of contextual understanding in theory of mind a study on question an
- the harmonic structure of information contours | arXiv: 2506.03902
- the hidden attention of mamba models | arXiv: 2403.01590
- the hidden space of safety understanding preference-tuned llms in multilingual c | arXiv: 2504.02708
- The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It | arXiv: 2406.13181
- The Impact of Token Granularity on the Predictive Power of Language Model Surprisal | arXiv: 2412.11940
- The Impossibility of Fair LLMs | arXiv: 2406.03198
- the invisible hand unveiling provider bias in large language models for code gen
- the knowledge microscope features as better analytical lenses than neurons | arXiv: 2502.12483
- the lawyer that never thinks consistency and fairness as keys to reliable ai
- the male ceo and the female assistant evaluation and mitigation of gender biases
- the mirage of model editing revisiting evaluation in the wild | arXiv: 2502.11177
- the nature of nlp analyzing contributions in nlp papers | arXiv: 2409.19505
- the noisy path from source to citation measuring how scholars engage with past r | arXiv: 2502.20581
- the role of abstract representations and observed preferences in the ordering of
- the role of deductive and inductive reasoning in large language models | arXiv: 2410.02892
- the role of exploration modules in small language models for knowledge graph que | arXiv: 2509.07399
- the role of visual modality in multimodal mathematical reasoning challenges and | arXiv: 2503.04167
- the task shield enforcing task alignment to defend against indirect prompt injec
- the time scale of redundancy between prosody and linguistic context | arXiv: 2503.11630
- The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models | arXiv: 2410.16672
- the ud-newscrawl treebank reflections and challenges from a large-scale tagalog
- theme-explanation structure for table summarization using large language models | arXiv: 2501.10487
- theorem prover as a judge for synthetic data generation | arXiv: 2502.13137
- theorem-of-thought a multi-agent framework for abductive deductive and inductive | arXiv: 2506.07106
- TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding | arXiv: 2502.19400
- theoretical analysis of hierarchical language recognition and generation by tran
- theoretical guarantees for minimum bayes risk decoding | arXiv: 2502.12685
- Theory of Mind in Large Language Models: Assessment and Enhancement | arXiv: 2505.00026
- Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling | arXiv: 2412.14860
- thinkguard deliberative slow thinking leads to cautious guardrails | arXiv: 2502.13458
- thor-moe hierarchical task-guided and context-responsive routing for neural mach | arXiv: 2505.14173
- tic-lm a web-scale benchmark for time-continual llm pretraining | arXiv: 2504.02107
- tigerllm - a family of bangla large language models | arXiv: 2503.10995
- time-mqa time series multi-task question answering with context enhancement | arXiv: 2503.01875
- TIP of the Iceberg: Task-in-Prompt Adversarial Attacks on LLMs | arXiv: 2501.18626
- to code or not to code adaptive tool integration for math language models via ex | arXiv: 2502.00691
- TokAlign: Efficient Vocabulary Adaptation via Token Alignment | arXiv: 2506.03523
- Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs | arXiv: 2412.11556
- token pruning in multimodal large language models are we solving the right probl | arXiv: 2502.11501
- tokenisation is np-complete | arXiv: 2412.15210
- tokenization is sensitive to language variation | arXiv: 2502.15343
- toolcoder a systematic code-empowered tool learning framework for large language | arXiv: 2502.11404
- ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models | arXiv: 2502.11404
- ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use | arXiv: 2501.02506
- ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use | arXiv: 2501.02506
- toolspectrum towards personalized tool utilization for large language models | arXiv: 2505.13176
- top-nsigma eliminating noise in logit space for robust token sampling of llm
- toward automatic discovery of a canine phonetic alphabet
- toward structured knowledge reasoning contrastive retrieval-augmented generation | arXiv: 2506.00842
- Towards a More Generalized Approach in Open Relation Extraction | arXiv: 2505.22801
- towards a principled evaluation of knowledge editors | arXiv: 2507.05937
- towards adaptive memory-based optimization for enhanced retrieval-augmented gene | arXiv: 2504.05312
- towards better chain-of-thought a reflection on effectiveness and faithfulness | arXiv: 2405.18915
- Towards Better Evaluation for Generated Patent Claims | arXiv: 2505.11095
- towards better open-ended text generation a multicriteria evaluation framework | arXiv: 2410.18653
- towards better value principles for large language model alignment a systematic
- towards building large scale datasets and state-of-the-art automatic speech tran
- towards comprehensive argument analysis in education dataset tasks and method | arXiv: 2505.12028
- towards context-robust llms a gated representation fine-tuning approach | arXiv: 2502.14100
- towards dynamic theory of mind evaluating llm adaptation to temporal evolution o | arXiv: 2505.17663
- towards effective and efficient continual pre-training of large language models | arXiv: 2407.18743
- towards effective extraction and evaluation of factual claims | arXiv: 2502.10855
- towards enhanced immersion and agency for llm-based interactive drama | arXiv: 2502.17878
- towards explainable temporal reasoning in large language models a structure-awar | arXiv: 2505.15245
- towards fairness assessment of dutch hate speech detection | arXiv: 2506.12502
- towards fully exploiting llm internal states to enhance knowledge boundary perce
- Towards Geo-Culturally Grounded LLM Generations | arXiv: 2502.13497
- towards global ai inclusivity a large-scale multilingual terminology dataset gis | arXiv: 2412.18367
- towards harmonized uncertainty estimation for large language models | arXiv: 2505.19073
- towards llm-powered attentive listener a pragmatic approach through quantity sel
- towards multi-dimensional evaluation of llm summarization across domains and lan
- towards objective fine-tuning how llms prior knowledge causes potential poor cal
- Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications | arXiv: 2501.02460
- towards reliable large audio language model | arXiv: 2505.19294
- Towards Reward Fairness in RLHF: From a Resource Allocation Perspective | arXiv: 2505.23349
- Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients | arXiv: 2410.22815
- Towards Robust ESG Analysis Against Greenwashing Risks: A3CG | arXiv: 2502.15821
- towards robust universal information extraction dataset evaluation and solution
- towards safety reasoning in llms ai-agentic deliberation for policy-embedded cot | arXiv: 2505.21784
- towards storage-efficient visual document retrieval an empirical study on reduci | arXiv: 2506.04997
- towards style alignment in cross-cultural translation | arXiv: 2507.00216
- towards text-image interleaved retrieval | arXiv: 2502.12799
- Towards the Law of Capacity Gap in Distilling Language Models | arXiv: 2311.07052
- tracing and dissecting how llms recall factual knowledge for real world question
- tracking lifes ups and downs mining life events from social media posts for ment
- TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning | arXiv: 2503.04381
- training dynamics underlying language model scaling laws loss deceleration and z | arXiv: 2506.05447
- training language model to critique for better refinement | arXiv: 2506.22157
- training turn-by-turn verifiers for dialogue tutoring agents the curious case of | arXiv: 2502.13311
- training-free llm merging for multi-task learning | arXiv: 2506.12379
- Trans-PEFT: Transferable Parameter-Efficient Fine-Tuning on Evolving Base Models | arXiv: 2506.06844
- trans-zero self-play incentivizes large language models for multilingual transla | arXiv: 2504.14669
- transbench breaking barriers for transferable graphical user interface agents in | arXiv: 2505.17629
- transferring textual preferences to vision-language understanding through model | arXiv: 2502.13487
- transforming podcast preview generation from expert models to llm-based systems | arXiv: 2505.23908
- translate with care addressing gender bias neutrality and reasoning in large lan | arXiv: 2506.00748
- translation and fusion improves cross-lingual information extraction | arXiv: 2305.13582
- trates trait-specific rubric-assisted cross-prompt essay scoring | arXiv: 2505.14577
- tree-kg an expandable knowledge graph construction framework for knowledge-inten
- tree-of-code a tree-structured exploring framework for end-to-end code generatio | arXiv: 2412.15305
- tree-of-debate multi-persona debate trees elicit critical thinking for scientifi | arXiv: 2502.14767
- Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models
- treecut a synthetic unanswerable math word problem dataset for llm hallucination | arXiv: 2502.13442
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | arXiv: 2506.11902
- tremu towards neuro-symbolic temporal reasoning for llm-agents with memory in mu | arXiv: 2502.01630
- trident enhancing large language model safety with tri-dimensional diversified r
- TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs | arXiv: 2412.11242
- tripcraft a benchmark for spatio-temporally fine grained travel planning | arXiv: 2502.20508
- triplefact defending data contamination in the evaluation of llm-driven fake new
- triptailor a real-world benchmark for personalized travel planning | arXiv: 2508.01432
- TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification | arXiv: 2503.15289
- truth knows no language evaluating truthfulness beyond english | arXiv: 2502.09387
- tst a schema-based top-down and dynamic-aware agent of text-to-table tasks
- tumlu a unified and native language understanding benchmark for turkic languages | arXiv: 2502.11020
- Tuna: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos | arXiv: 2505.20124
- tunable llm-based proactive recommendation agent
- Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling | arXiv: 2504.15754
- twist text-encoder weight-editing for inserting secret trojans in text-to-image
- two intermediate translations are better than one fine-tuning llms for document-
- typed-rag type-aware decomposition of non-factoid questions for retrieval-augmen | arXiv: 2503.15879
- typology-guided adaptation in multilingual models
- ualign leveraging uncertainty estimations for factuality alignment on large lang | arXiv: 2412.11803
- uaqfact evaluating factual knowledge utilization of llms on unanswerable questio | arXiv: 2505.23461
- umedsum a unified framework for clinical abstractive summarization
- un-considering contextual information assessing llms understanding of indexical | arXiv: 2506.01089
- unanswerability evaluation for retrieval augmented generation | arXiv: 2412.12300
- uncertainty in causality a new frontier
- uncertainty propagation on llm agent
- uncertainty unveiled can exposure to more in-context examples mitigate uncertain | arXiv: 2505.21003
- uncertainty-aware iterative preference optimization for enhanced llm reasoning
- uncovering the impact of chain-of-thought reasoning for direct preference optimi
- Uncovering Visual-Semantic Psycholinguistic Properties from the Distributional Structure of Text Embedding Space | arXiv: 2505.23029
- understanding and meeting practitioner needs when measuring representational har | arXiv: 2506.04482
- understanding common ground misalignment in goal-oriented dialog a case-study wi | arXiv: 2503.12370
- understanding cross-domain adaptation in low-resource topic modeling | arXiv: 2506.07453
- Understanding Impact of Human Feedback via Influence Functions | arXiv: 2501.05790
- understanding in-context machine translation for low-resource languages a case s | arXiv: 2502.11862
- understanding large language model vulnerabilities to social bias attacks
- understanding silent data corruption in llm training | arXiv: 2502.12340
- understanding the dark side of llms intrinsic self-correction | arXiv: 2412.14959
- understanding the repeat curse in large language models from a feature perspecti | arXiv: 2504.14218
- uni-retrieval a multi-style retrieval framework for stems education | arXiv: 2502.05863
- unicodec unified audio codec with single domain-adaptive codebook | arXiv: 2502.20067
- UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations | arXiv: 2507.07030
- Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes | arXiv: 2505.22165
- unifying language agent algorithms with graph-based orchestration engine for rep | arXiv: 2505.24354
- UniICL: An Efficient ICL Framework Unifying Compression, Selection, and Generation | arXiv: 2405.17062
- unilr unleashing the power of llms on multiple legal tasks with a unified legal
- unintended harms of value-aligned llms psychological and empirical insights | arXiv: 2506.06404
- UniQuanF: Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models | arXiv: 2506.03781
- unique hard attention a tale of two sides | arXiv: 2503.14615
- unirag unified query understanding method for retrieval augmented generation
- Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering | arXiv: 2503.11314
- unlocking recursive thinking of llms alignment via refinement | arXiv: 2506.06009
- unlocking speech instruction data potential with query rewriting | arXiv: 2507.08603
- unmasking style sensitivity a causal analysis of bias evaluation instability in
- Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging | arXiv: 2505.22934
- unraveling the mechanics of learning-based demonstration selection for in-contex
- unravelling the logic investigating the generalisation of transformers in numeri
- unseentimeqa time-sensitive question-answering beyond llms memorization | arXiv: 2407.03525
- Unsolvable Problem Detection: Evaluating Trustworthiness of Large Multimodal Models | arXiv: 2403.20331
- unsupervised morphological tree tokenizer | arXiv: 2406.15245
- untie the knots an efficient data augmentation strategy for long-context pre-tra
- unveil unified visual-textual integration and distillation for multi-modal docum
- unveiling and addressing pseudo forgetting in large language models | arXiv: 2411.11932
- unveiling attractor cycles in large language models a dynamical systems view of | arXiv: 2502.15208
- unveiling cultural blind spots analyzing the limitations of mllms in procedural | arXiv: 2502.14315
- unveiling dual quality in product reviews an nlp-based approach | arXiv: 2505.19254
- unveiling environmental impacts of large language model serving a functional uni
- Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders | arXiv: 2505.05111
- Unveiling Privacy Risks in LLM Agent Memory | arXiv: 2502.13172
- unveiling the key factors for distilling chain-of-thought reasoning | arXiv: 2502.18001
- unveiling the lack of lvlm robustness to fundamental visual variations why and p | arXiv: 2504.16727
- unveiling the potential of bert-family a new recipe for building scalable genera
- unveiling the power of source source-based minimum bayes risk decoding for neura | arXiv: 2406.11632
- uora uniform orthogonal reinitialization adaptation in parameter efficient fine-
- upcycling instruction tuning from dense to mixture-of-experts via parameter merg | arXiv: 2410.01610
- urbanvideo-bench benchmarking vision-language models on embodied intelligence wi
- usdc a dataset of underlineuser underlinestance and underlinedogmatism in long u | arXiv: 2406.16833
- user-side model consistency monitoring for open source large language models inf
- using information theory to characterize prosodic typology the case of tone pitc
- using shapley interactions to understand how models use structure | arXiv: 2403.13106
- using source-side confidence estimation for reliable translation into unfamiliar | arXiv: 2503.23305
- using subtext to enhance generative idrr
- utboost rigorous evaluation of coding agents on swe-bench | arXiv: 2506.09289
- v-oracle making progressive reasoning in deciphering oracle bones for you and me
- value portrait assessing language models values through psychometrically and eco | arXiv: 2505.01015
- value residual learning | arXiv: 2410.17897
- Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition | arXiv: 2411.11479
- vaquum are vague quantifiers grounded in visual data | arXiv: 2502.11874
- velocitune a velocity-based dynamic domain reweighting method for continual pre- | arXiv: 2411.14318
- Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning | arXiv: 2505.16128
- verbosity-aware rationale reduction effective reduction of redundant rationale v | arXiv: 2412.21006
- VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | arXiv: 2505.23693
- vidcapbench a comprehensive benchmark of video captioning for controllable text- | arXiv: 2502.12782
- videovista-culturallingo 360textdegree horizons-bridging cultures languages and
- vigil3d a linguistically diverse dataset for 3d visual grounding | arXiv: 2501.01366
- visa retrieval augmented generation with visual source attribution | arXiv: 2412.14457
- vision-language models struggle to align entities across modalities | arXiv: 2503.03854
- visual cues enhance predictive turn-taking for two-party human interaction | arXiv: 2505.21043
- Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
- visuothink empowering lvlm reasoning with multimodal tree search | arXiv: 2504.09130
- VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | arXiv: 2502.13775
- vlm2-bench a closer look at how well vlms implicitly link explicit matching visu
- VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service | arXiv: 2506.15755
- vlsbench unveiling visual leakage in multimodal safety | arXiv: 2411.19939
- vmlu benchmarks a comprehensive benchmark toolkit for vietnamese llms
- voting or consensus decision-making in multi-agent debate | arXiv: 2502.19130
- voxeval benchmarking the knowledge understanding capabilities of end-to-end spok | arXiv: 2501.04962
- voxrag a step toward transcription-free rag systems in spoken question answering | arXiv: 2505.17326
- vqaguider guiding multimodal large language models to answer complex video quest
- VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | arXiv: 2506.08691
- vulnerability of llms to vertically aligned text manipulations | arXiv: 2410.20016
- waffle fine-tuning multi-modal model for automated front-end development
- Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options | arXiv: 2409.00113
- walk in others shoes with a single glance human-centric visual grounding with to
- wanda pruning large language models via regional gradients | arXiv: 2503.04992
- warmup generations a task-agnostic approach for guiding sequence-to-sequence lea
- warriorcoder learning from expert battles to augment code large language models | arXiv: 2412.17395
- watching the watchers exposing gender disparities in machine translation quality | arXiv: 2410.10995
- watermarking large language models an unbiased and low-risk method
- wavrag audio-integrated retrieval augmented generation for spoken dialogue model | arXiv: 2502.14727
- we-math does your large multimodal model achieve human-like mathematical reasoni
- weaving context across images improving vision-language models through focus-cen
- webwalker benchmarking llms in web traversal | arXiv: 2501.07572
- weed out then harvest dual low-rank adaptation is an effective noisy label detec | arXiv: 2510.10208
- well begun is half done low-resource preference alignment by weak-to-strong deco | arXiv: 2506.07434
- WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermark | arXiv: 2409.04459
- what are the essential factors in crafting effective long context multi-hop inst | arXiv: 2409.01893
- what do you call a dog that is incontrovertibly true dogma testing llm generaliz
- what happened in llms layers when trained for fast vs slow thinking a gradient p | arXiv: 2410.23743
- what is stigma attributed to a theory-grounded expert-annotated interview corpus | arXiv: 2505.12727
- What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations | arXiv: 2502.08279
- What Makes a Good Natural Language Prompt? | arXiv: 2506.06950
- what matters in evaluating book-length stories a systematic study of long story | arXiv: 2512.12839
- What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs | arXiv: 2505.19773
- whats the difference supporting users in identifying the effects of prompt and m
- when backdoors speak understanding llm backdoor attacks through model-generated | arXiv: 2411.12701
- when claims evolve evaluating and enhancing the robustness of embedding models a | arXiv: 2503.03417
- when gpt spills the tea comprehensive assessment of knowledge file leakage in gp
- when harry meets superman the role of the interlocutor in persona-based dialogue | arXiv: 2505.24613
- when large language models meet speech a survey on integration approaches | arXiv: 2502.19548
- When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models | arXiv: 2502.13246
- when should dense retrievers be updated in evolving corpora detecting out-of-dis | arXiv: 2506.01877
- when the lm misunderstood the human chuckled analyzing garden path effects in hu
- When to Speak, When to Abstain: Contrastive Decoding with Abstention | arXiv: 2412.12527
- where are we evaluating llm performance on african languages | arXiv: 2502.19582
- which demographics do llms default to during annotation | arXiv: 2410.08820
- Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above | arXiv: 2502.14127
- which retain set matters for llm unlearning a case study on entity unlearning | arXiv: 2502.11441
- whispa semantically and psychologically aligned whisper with self-supervised con | arXiv: 2501.16344
- white men lead black women help benchmarking and mitigating language agency soci
- who can withstand chat-audio attacks an evaluation benchmark for large audio-lan | arXiv: 2411.14842
- who taught you that tracing teachers in model distillation | arXiv: 2502.06659
- Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection | arXiv: 2502.12611
- Whose Boat Does it Float? Improving Personalization in Preference Optimization | arXiv: 2501.11549
- why are positional encodings nonessential for deep autoregressive transformers r | arXiv: 2501.00659
- why not act on what you know unleashing safety potential of llms via self-aware | arXiv: 2505.12060
- why prompt design matters and works a complexity analysis of prompt search space | arXiv: 2503.10084
- why safeguarded ships run aground aligned large language models safety mechanism | arXiv: 2502.13946
- wicked a simple method to make multiple choice benchmarks more challenging | arXiv: 2502.18316
- wikimixqa a multimodal benchmark for question answering over tables and charts | arXiv: 2506.15594
- winspot gui grounding benchmark with multimodal large language models
- wirelessmathbench a mathematical modeling benchmark for llms in wireless communi | arXiv: 2505.14354
- wizard of shopping target-oriented e-commerce dialogue generation with decision | arXiv: 2502.00969
- words of warmth trust and sociability norms for over 26k english words | arXiv: 2506.03993
- world modeling makes a better planner dual preference optimization for embodied | arXiv: 2503.10480
- Writing Like the Best: Exemplar-Based Expository Text Generation | arXiv: 2505.18859
- wximpactbench a disruptive weather impact understanding benchmark for evaluating | arXiv: 2505.20249
- X-Turing: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents | arXiv: 2408.09853
- x-webagentbench a multilingual interactive web benchmark for evaluating global a | arXiv: 2505.15372
- xdac xai-driven detection and attribution of llm-generated news comments in kore
- yes my lord guiding language model extraction with locality reinforced distillat
- YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering | arXiv: 2505.14279
- you need to mimic to get fame solving meeting transcript scarcity with a multi-a | arXiv: 2502.13001
- your model is overconfident and other lies we tell ourselves | arXiv: 2503.01235
- yulan-mini pushing the limits of open data-efficient language model
- zero-shot belief a hard problem for llms | arXiv: 2502.08777
- zero-shot conversational stance detection dataset and approaches | arXiv: 2506.17693
- zero-shot text-to-speech for vietnamese | arXiv: 2506.01322
- zipa a family of efficient models for multilingual phone recognition | arXiv: 2505.23170
- zjuklab at semeval-2025 task 4 unlearning via model merging | arXiv: 2503.21088
- 𝛿-stance a large-scale real world dataset of stances in legal argumentation
- crafting privacy-preserving adversarial examples a defense against membership inf
- an empirical study on detecting ai-generated text in financial reports
- cognitive framework for detecting ai-generated fiction
- haco-det-fine-grained-detection-under-human-ai-coauthoring | arXiv: 2506.02959
- mcp-zero-shot-mgt-detection-via-conformal-prediction | arXiv: 2505.05084
- atri-mitigating-multilingual-audio-text-retrieval-inconsistencies | arXiv: 2502.14627
- audio token consistency | arXiv: 2409.19283
- contextual biasing with the knowledgeable external language model for end-to-end
- coa-reasoning explorations on counterfactual analysis in physical reasoning of l
- counterfactual explanations for aspect-based sentiment analysis
- benchmarking long-context language models on long code understanding | arXiv: 2503.04359
- beyond sequences two-dimensional representation and dependency encoding for code
- coco-bench a comprehensive code benchmark for multi-task large language model ev | arXiv: 2504.20673
- codedpo code alignment | arXiv: 2410.05605
- codeif benchmarking the instruction-following capabilities of large language mod | arXiv: 2502.19166
- codereviewqa the code review comprehension assessment for large language models | arXiv: 2503.16167
- compileagent automated real-world repo-level compilation with tool-integrated ll | arXiv: 2505.04254
- coret improved retriever for code editing | arXiv: 2505.24715
- dars dynamic action re-sampling to enhance coding agent performance by adaptive | arXiv: 2503.14269
- dynacode a dynamic complexity-aware code benchmark for evaluating large language | arXiv: 2503.10452
- etf an entity tracing framework for hallucination detection in code summaries | arXiv: 2410.14748
- exploracoder advancing code generation for multiple unseen apis via planning and | arXiv: 2412.05366
- feabench repo code gen | arXiv: 2503.06680
- galla graph aligned large language models | arXiv: 2409.04183
- gift gibbs fine tuning code gen | arXiv: 2502.11466
- mldebugging towards benchmarking code debugging across multi-library scenarios | arXiv: 2506.13824
- oasis order-augmented strategy for improved code search | arXiv: 2503.08161
- personality guided code gen | arXiv: 2411.00006
- program synthesis benchmark for visual programming in xlogoonline environment | arXiv: 2406.11334
- reflectioncoder learning from reflection sequence for enhanced one-off code gene | arXiv: 2405.17057
- rethinking repetition problems of llms in code generation | arXiv: 2505.10402
- revisit self-debugging with self-generated tests for code generation
- scenegenagent precise industrial scene generation with coding agent | arXiv: 2410.21909
- texpert a multi-level benchmark for evaluating latex code generation by llms | arXiv: 2506.16990
- tree-of-code a tree-structured exploring framework for end-to-end code generatio | arXiv: 2412.15305
- tree of evolution code gen
- utboost rigorous evaluation of coding agents on swe-bench | arXiv: 2506.09289
- contradiction detection in rag-based chatbots
- dialogue systems for emotional support via value reinforcement | arXiv: 2501.17182
- enabling chatbots with eyes and ears an immersive multimodal conversation system | arXiv: 2506.00421
- enhancing goal-oriented proactive dialogue systems via consistency reflection an | arXiv: 2506.13366
- enstom enhancing dialogue systems with entropy-scaled steering vectors for topic | arXiv: 2505.16526
- know you first and be you better modeling human-like user simulators via implici | arXiv: 2502.18968
- know your mistakes towards preventing overreliance on task-oriented conversation | arXiv: 2501.10316
- kokorochat a japanese psychological counseling dialogue | arXiv: 2506.01357
- persona sentiment dialogue | arXiv: 2502.11423
- personalens a benchmark for personalization evaluation in conversational ai assi | arXiv: 2506.09902
- reflectdiffu empathetic response | arXiv: 2409.10289
- single- vs dual-prompt dialogue generation with llms for job interviews in human | arXiv: 2502.18650
- uniconv retrieval response gen | arXiv: 2507.07030
- when harry meets superman the role of the interlocutor in persona-based dialogue | arXiv: 2505.24613
- wizard of shopping target-oriented e-commerce dialogue generation with decision | arXiv: 2502.00969
- agent steerable search for knowledge graph question answering
- a reality check on context utilisation for retrieval-augmented generation | arXiv: 2412.17031
- a text is worth several tokens text embedding from llms secretly aligns well wit | arXiv: 2406.17378
- accelerating adaptive retrieval augmented generation via instruction-driven repr | arXiv: 2505.12731
- air-bench automated heterogeneous information retrieval benchmark | arXiv: 2412.13102
- any information is just worth one single screenshot unifying search with visuali | arXiv: 2502.11431
- are llms effective psychological assessors leveraging adaptive rag for interpret | arXiv: 2501.00982
- atomic llm a fine-grained information retrieval evaluation benchmark for languag
- automatic benchmark generation from scientific papers via retrieval-augmented ll
- beyond true or false retrieval-augmented hierarchical analysis of nuanced claims | arXiv: 2506.10728
- CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling | arXiv: 2406.17507
- coir a comprehensive benchmark for code information retrieval models | arXiv: 2407.02883
- collapse dense retrievers | arXiv: 2503.05037
- comrag retrieval-augmented generation with dynamic vector stores for real-time c | arXiv: 2506.21098
- core mmrag knowledge reconciliation | arXiv: 2506.02544
- cross-lingual relevance transfer for document retrieval
- divide then align rag knowledge boundary | arXiv: 2505.20871
- dont reinvent the wheel efficient instruction-following text embedding based on | arXiv: 2505.24754
- drag distilling rag slm | arXiv: 2506.01954
- drama diverse augmentation from large language models to smaller dense retriever | arXiv: 2502.18460
- empaths at semeval-2025 task 11 retrieval-augmented approach to perceived emotio | arXiv: 2506.04409
- enhancing lexicon-based text embeddings with large language models | arXiv: 2501.09749
- evaluation of attribution bias in generator-aware retrieval-augmented large lang | arXiv: 2410.12380
- exit context-aware extractive compression for enhancing retrieval-augmented gene | arXiv: 2412.12559
- faithfulrag fact level conflict | arXiv: 2506.08938
- flashbackefficient retrieval-augmented language modeling for long context infere | arXiv: 2405.04065
- flexrag a flexible and comprehensive framework for retrieval-augmented generatio | arXiv: 2506.12494
- from ambiguity to accuracy the transformative effect of coreference resolution o | arXiv: 2507.07847
- gainrag preference alignment | arXiv: 2505.18710
- garage a benchmark with grounding annotations for rag evaluation | arXiv: 2506.07671
- genie worksheets tod agent | arXiv: 2407.05674
- gor rag long context summary | arXiv: 2410.11001
- graf graph retrieval augmented by facts for romanian legal multi-choice question | arXiv: 2412.04119
- gumbel reranking | arXiv: 2502.11116
- health-llm personalized retrieval-augmented disease prediction system | arXiv: 2402.00746
- helios harmonizing early fusion late fusion and llm reasoning for multi-granular | arXiv: 2603.02248
- hierarchical document refinement for long-context retrieval-augmented generation | arXiv: 2505.10413
- hoh a dynamic benchmark for evaluating the impact of outdated information on ret | arXiv: 2503.04800
- hybgrag hybrid rag skb | arXiv: 2412.16311
- hypothetical documents or knowledge leakage rethinking llm-based query expansion | arXiv: 2504.14175
- investigating language preference of multilingual rag systems | arXiv: 2502.11175
- investigating the robustness of retrieval-augmented generation at the query leve | arXiv: 2507.06956
- knowshiftqa rag knowledge shifts | arXiv: 2412.08985
- ldir low-dimensional dense and interpretable text embeddings with relative repre | arXiv: 2505.10354
- llm psychological assessor | arXiv: 2501.00982
- llm reranking harmful content | arXiv: 2501.13977
- logical consistency is vital neural-symbolic information retrieval for negative- | arXiv: 2505.22299
- main-rag multi-agent filtering retrieval-augmented generation | arXiv: 2501.00332
- maximal matching matters preventing representation collapse for robust cross-mod | arXiv: 2506.21538
- memerag a multilingual end-to-end meta-evaluation benchmark for retrieval augmen | arXiv: 2502.17163
- mitigating lost-in-retrieval problems in retrieval augmented multi-hop question | arXiv: 2502.14245
- moc mixtures of text chunking learners for retrieval-augmented generation system | arXiv: 2503.09600
- mt-raig novel benchmark and evaluation framework for retrieval-augmented insight | arXiv: 2502.11735
- multilingual retrieval augmented generation for culturally-sensitive tasks a ben | arXiv: 2410.01171
- on synthetic data strategies for domain-specific generative retrieval | arXiv: 2502.17957
- optimized text embedding models and benchmarks for amharic passage retrieval | arXiv: 2505.19356
- pandora box rag noise | arXiv: 2408.13533
- parenting optimizing knowledge selection of retrievalaugmented | arXiv: 2410.10360
- prism political bias embeddings | arXiv: 2505.24646
- psycholinguistic visual semantic | arXiv: 2505.23029
- raemollm retrieval augmented llms for cross-domain misinformation detection usin | arXiv: 2406.11093
- rageval scenario specific rag evaluation dataset generation framework | arXiv: 2408.01262
- rare retrieval augmented reasoning | arXiv: 2412.02830
- redundancy isotropy and intrinsic dimensionality of prompt-based text embeddings | arXiv: 2506.01435
- refind at semeval-2025 task 3 retrieval-augmented factuality hallucination detec | arXiv: 2502.13622
- removal of hallucination on hallucination debate-augmented rag | arXiv: 2505.18581
- reranking-based generation for unbiased perspective summarization | arXiv: 2506.15925
- saferag benchmarking security in retrieval-augmented generation of large languag | arXiv: 2501.18636
- seakr self-aware knowledge retrieval for adaptive retrieval augmented generation | arXiv: 2406.19215
- seal scaling to emphasize attention for long-context retrieval | arXiv: 2501.15225
- semantic outlier removal with embedding models and llms | arXiv: 2506.16644
- setr set selection rag | arXiv: 2507.06838
- sgic a self-guided iterative calibration framework for rag | arXiv: 2506.16172
- sticking to the mean detecting sticky tokens in text embedding models | arXiv: 2507.18171
- the distracting effect understanding irrelevant passages in rag | arXiv: 2505.06914
- toward structured knowledge reasoning contrastive retrieval-augmented generation | arXiv: 2506.00842
- towards adaptive memory-based optimization for enhanced retrieval-augmented gene | arXiv: 2504.05312
- towards storage-efficient visual document retrieval an empirical study on reduci | arXiv: 2506.04997
- typed-rag type-aware decomposition of non-factoid questions for retrieval-augmen | arXiv: 2503.15879
- unanswerability evaluation for retrieval augmented generation | arXiv: 2412.12300
- unise-visualized-information-retrieval-with-screenshots | arXiv: 2502.11431
- visa retrieval augmented generation with visual source attribution | arXiv: 2412.14457
- voxrag a step toward transcription-free rag systems in spoken question answering | arXiv: 2505.17326
- when claims evolve evaluating and enhancing the robustness of embedding models a | arXiv: 2503.03417
- when should dense retrievers be updated in evolving corpora detecting out-of-dis | arXiv: 2506.01877
- a dual-perspective nlg meta-evaluation framework with automatic benchmark and be
- an empirical study of mechanistic interpretability approaches for factual recall
- around the world in 24 hours probing llm knowledge of time and place | arXiv: 2506.03984
- bias attribution in filipino language models extending a bias interpretability m | arXiv: 2506.07249
- cleme2 gec evaluation | arXiv: 2407.00934
- cracking factual knowledge a comprehensive analysis of degenerate knowledge neur | arXiv: 2402.13731
- degenerate knowledge neurons | arXiv: 2402.13731
- expert an explainable image captioning evaluation metric with structured explana | arXiv: 2506.24016
- irt router multi llm | arXiv: 2506.01048
- language agnostic concepts | arXiv: 2411.08745
- llama see llama do entrainment | arXiv: 2505.09338
- mechanistic interpretability of emotion inference in large language models | arXiv: 2502.05489
- normalized aopc faithfulness metrics | arXiv: 2408.08137
- output centric interpretability | arXiv: 2501.08319
- position-aware automatic circuit discovery | arXiv: 2502.04577
- probing subphonemes in morphology models | arXiv: 2505.11297
- probing the geometry of truth consistency and generalization of truth directions | arXiv: 2506.00823
- reasoning circuits in language models a mechanistic interpretation of syllogisti | arXiv: 2408.08590
- retrieve to explain drug target identification | arXiv: 2402.04068
- safety is not only about refusal reasoning-enhanced fine-tuning for interpretabl | arXiv: 2503.05021
- separating tongue from thought activation patching reveals language-agnostic con | arXiv: 2411.08745
- shortcut neuron eval | arXiv: 2506.04142
- the anatomy of evidence an investigation into explainable icd coding | arXiv: 2507.01802
- towards explainable temporal reasoning in large language models a structure-awar | arXiv: 2505.15245
- a general knowledge injection framework for icd coding | arXiv: 2505.18708
- adaptive detoxification safeguarding general capabilities of llms through toxici | arXiv: 2505.22298
- bmike-53 investigating cross-lingual knowledge editing with in-context learning | arXiv: 2406.17764
- chainedit propagating ripple effects in llm | arXiv: 2507.08427
- cknowedit chinese knowledge editing dataset llms | arXiv: 2409.05806
- compke complex question answering under knowledge editing | arXiv: 2506.00829
- context-robust knowledge editing for language models | arXiv: 2505.23026
- docmedit towards document-level model editing | arXiv: 2505.19572
- efficient knowledge editing | arXiv: 2506.04226
- megen generative backdoor into large language models via model editing | arXiv: 2408.10722
- memorizing is not enough deep knowledge injection through reasoning | arXiv: 2504.00472
- mitigating negative interference in multilingual sequential knowledge editing th | arXiv: 2506.10800
- neuron-level sequential editing for large language models | arXiv: 2410.04045
- revealing the deceptiveness of knowledge editing a mechanistic analysis of super | arXiv: 2505.12636
- sake steering activations for knowledge editing | arXiv: 2503.01751
- scedit script-based assessment of knowledge editing | arXiv: 2505.23291
- structure-aware domain knowledge injection for large language models | arXiv: 2407.16724
- the mirage of model editing revisiting evaluation in the wild | arXiv: 2502.11177
- towards a principled evaluation of knowledge editors | arXiv: 2507.05937
- agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals | arXiv: 2502.19328
- an empirical study on llm-based agents for automated bug fixing | arXiv: 2411.10213
- bookworld from novels to interactive agent societies for story creation | arXiv: 2504.14538
- crowdcounter llm-agent-based scalable framework for web information gathering
- repro-bench can agentic ai systems assess the reproducibility of research claims
- aligning to what limits to rlhf based alignment | arXiv: 2503.09025
- constitutional classifiers defending against universal jailbreaks across thousan | arXiv: 2501.18837
- intuitive fine tuning simplifying alignment into single process | arXiv: 2405.11870
- accelerating speculative decoding via efficient context-aware draft generation
- consistency-preserving contrastive decoding for faithful document-grounded dial
- coprus consistency preserving utterance synthesis towards more realistic benchma
- fuel-unveiling-environmental-impacts-of-llm-serving | arXiv: 2502.11256
- fuel unveiling environmental impacts of llm serving | arXiv: 2502.11256
- a conformal risk control framework for granular word assessment and uncertainty | arXiv: 2504.01225
- a mismatched benchmark for scientific natural language inference | arXiv: 2506.04603
- abgen evaluating large language models in | arXiv: 2507.13300
- access denied inc the first benchmark environment for sensitivity awareness | arXiv: 2506.00964
- ad-hoc concept forming in the game codenames as a means for evaluating large lan | arXiv: 2502.11707
- ad-llm benchmarking large language models for anomaly detection | arXiv: 2412.11142
- androidlab autonomous agent | arXiv: 2410.24024
- antileakbench preventing data contamination by automatically constructing benchm | arXiv: 2412.13670
- atomic calibration of llms in long-form generations | arXiv: 2410.13246
- batayan a filipino nlp benchmark for evaluating large language models | arXiv: 2502.14911
- belarusian glue
- benchmarking llms and llm-based agents in practical vulnerability detection for | arXiv: 2503.03586
- benchmarking uncertainty quantification methods for large language models with l | arXiv: 2406.15627
- besstie a benchmark for sentiment and sarcasm classification for varieties of en | arXiv: 2412.04726
- beyond one-size-fits-all tailored benchmarks for efficient evaluation | arXiv: 2502.13576
- browsing lost unformed recollections a benchmark for tip-of-the-tongue search an | arXiv: 2503.19193
- calibraeval calibrating prediction distribution to mitigate selection bias in ll | arXiv: 2410.15393
- calibration confidence text gen | arXiv: 2506.00637
- can external validation tools improve annotation quality for llm-as-a-judge | arXiv: 2507.17015
- cfbench a comprehensive constraints-following benchmark for llms | arXiv: 2408.01122
- chatbench from static benchmarks to human-ai evaluation | arXiv: 2504.07114
- codemenv benchmarking large language models on code migration | arXiv: 2506.00894
- com2 causal commonsense | arXiv: 2506.07064
- cov-eval-code-security-evaluation-benchmark | arXiv: 2505.10494
- cov eval evaluating llms from code security perspective | arXiv: 2505.10494
- culemo cultural lenses on emotion - benchmarking llms for cross-cultural emotion | arXiv: 2503.10688
- culturalbench a robust diverse and challenging cultural benchmark by human-ai cu | arXiv: 2410.02677
- ecomscriptbench | arXiv: 2505.15196
- editinspector a benchmark for evaluation of text-guided image edits | arXiv: 2506.09988
- educationq evaluating llms teaching capabilities through multi-agent dialogue fr | arXiv: 2504.14928
- elaboration competitive programming | arXiv: 2505.16667
- evowiki evaluating llms on evolving knowledge | arXiv: 2412.13582
- exposing numeracy gaps a benchmark to evaluate fundamental numerical abilities i | arXiv: 2502.11075
- financereasoning benchmarking financial numerical reasoning more | arXiv: 2506.05828
- from tools to teammates evaluating llms in multi-session coding interactions | arXiv: 2502.13791
- grace a granular benchmark for evaluating model calibration against human calibr | arXiv: 2502.19684
- guessarena guess who i am a | arXiv: 2505.22661
- hallulens llm hallucination benchmark | arXiv: 2504.17550
- hellaswag-pro a large-scale bilingual benchmark for evaluating the robustness of | arXiv: 2502.11393
- help write story feedback | arXiv: 2507.16007
- homebench evaluating llms in smart homes with valid and invalid instructions acr | arXiv: 2505.19628
- how far are llms from being our digital twins a benchmark for persona-based beha | arXiv: 2502.14642
- hpss heuristic prompting strategy search for llm evaluators | arXiv: 2502.13031
- influences on llm calibration a study of response agreement loss functions and p | arXiv: 2501.03991
- justrank llm judge system ranking | arXiv: 2412.09569
- kitab-bench a comprehensive multi-domain benchmark for arabic ocr and document u | arXiv: 2502.14949
- kristeva close reading as a novel task for benchmarking interpretive reasoning | arXiv: 2505.09825
- la leaderboard spanish | arXiv: 2507.00999
- language complexity measurement as a noisy zero-shot proxy for evaluating llm pe | arXiv: 2502.11578
- language model probabilities are not calibrated in numeric contexts | arXiv: 2410.16007
- mars benchmarking the metaphysical reasoning abilities of language models with a | arXiv: 2406.02106
- mcbe a multi-task chinese bias evaluation benchmark for large language models | arXiv: 2507.02088
- mdbench a synthetic multi-document reasoning benchmark generated with knowledge | arXiv: 2506.14927
- mis-prompt benchmarking large language models for proactive error handling | arXiv: 2506.00064
- mmlu-cf a contamination-free multi-task language understanding benchmark | arXiv: 2412.15194
- movie101v2 improved movie narration benchmark | arXiv: 2404.13370
- navigating rifts in human-llm grounding study and benchmark | arXiv: 2503.13975
- noreval a norwegian language understanding and generation evaluation benchmark | arXiv: 2504.07749
- onebench to test them all sample-level benchmarking over open-ended capabilities | arXiv: 2412.06745
- pap2pat benchmarking outline-guided long-text patent generation with patent-pape | arXiv: 2410.07009
- papersplease a benchmark for evaluating motivational values of large language mo | arXiv: 2506.21961
- patch psychometrics-assisted benchmarking of large language models against human | arXiv: 2404.01799
- physreason a comprehensive benchmark towards physics-based reasoning | arXiv: 2502.12054
- readoc a unified benchmark for realistic document structured extraction | arXiv: 2409.05137
- realhitbench a comprehensive realistic hierarchical table benchmark for evaluati | arXiv: 2506.13405
- retrieval models arent tool-savvy benchmarking tool retrieval for large language | arXiv: 2503.01763
- revisiting 3d llm benchmarks are we really testing 3d capabilities | arXiv: 2502.08503
- right answer wrong score uncovering the inconsistencies of llm evaluation in mul | arXiv: 2503.14996
- rulearena rule guided reasoning | arXiv: 2412.08972
- sanskriti a comprehensive benchmark for evaluating language models knowledge of | arXiv: 2506.15355
- seedbench a multi-task benchmark for evaluating large language models in seed sc | arXiv: 2505.13220
- sklep a slovak general language understanding benchmark | arXiv: 2506.21508
- somethings fishy in the data lake a critical re-evaluation of table union search | arXiv: 2505.21329
- structext eval | arXiv: 2406.10621
- structflowbench a structured flow benchmark for multi-turn instruction following | arXiv: 2502.14494
- swiltra-bench the swiss legal translation benchmark | arXiv: 2503.01372
- tic-lm a web-scale benchmark for time-continual llm pretraining | arXiv: 2504.02107
- towards dynamic theory of mind evaluating llm adaptation to temporal evolution o | arXiv: 2505.17663
- towards objective fine-tuning how llms prior knowledge causes potential poor cal | arXiv: 2505.20903
- tripcraft a benchmark for spatio-temporally fine grained travel planning | arXiv: 2502.20508
- triptailor a real-world benchmark for personalized travel planning | arXiv: 2508.01432
- tumlu a unified and native language understanding benchmark for turkic languages | arXiv: 2502.11020
- vital pluralistic alignment healthcare | arXiv: 2502.13775
- voxeval benchmarking the knowledge understanding capabilities of end-to-end spok | arXiv: 2501.04962
- webwalker benchmarking llms in web traversal | arXiv: 2501.07572
- where are we evaluating llm performance on african languages | arXiv: 2502.19582
- wicked a simple method to make multiple choice benchmarks more challenging | arXiv: 2502.18316
- wximpactbench a disruptive weather impact understanding benchmark for evaluating | arXiv: 2505.20249
- yescieval llm judge science | arXiv: 2505.14279
- agentdropout-dynamic-agent-elimination-for-multi-agent-collaboration | arXiv: 2503.18891
- ai as a novel ethical agent exploring moral judgments by large language models
- an empirical study of large language models for automated review generation
- analyzing the rapid generalization of sft via the perspective of attention head | arXiv: 2409.15820
- argument mining in the age of large language models
- arm alignment retrieval | arXiv: 2501.18539
- assessing and enhancing the causal reasoning abilities of language models via fai
- assessing the vulnerability of llms to cognitive biases in scientific research
- autoexp automatic experiment design and execution by llms
- beyond dialogue roleplay | arXiv: 2408.10903
- bfs-prover-scalable-best-first-tree-search-for-llm-based-automatic-theorem-proving | arXiv: 2502.03438
- can llms interpret leverage amrs | arXiv: 2504.04745
- catching shortcuts a framework for evaluating shortcuts in large language models
- cheaper and better diffusion language model via task-specific training
- clue guided re-assessment to improve reasoning in large language models
- collaborative performance prediction for large language models | arXiv: 2407.01300
- comparing large language models in extracting subjective information from politi
- comparing linguistic acceptability judgments of autoregressive language models
- concreteness versus abstractness a selectivity analysis in llms
- cross-modal alignment for llm-enhanced spoken language understanding
- epistemic-markers-in-confidence-estimation | arXiv: 2505.24778
- limitgen-llms-identify-research-limitations | arXiv: 2507.02694
- llm mapreduce simplified long sequence processing | arXiv: 2410.09342
- llms-comprehend-temporal-meaning-in-narratives | arXiv: 2507.14307
- neuronxa-cross-lingual-alignment-via-neurons | arXiv: 2507.14900
- rethinking-sorting-in-llm-pairwise-ranking | arXiv: 2505.24643
- rhio retrieval heads faithfulness | arXiv: 2501.13573
- seed stepwise reasoning disruption attack | arXiv: 2412.11934
- sft attention activation | arXiv: 2409.15820
- toolcoder code empowered tool learning | arXiv: 2502.11404
- activation-inversion-attack-stealing-training-data-in-decentralized-training | arXiv: 2502.16086
- adversarial tokenization | arXiv: 2503.02174
- asynclm efficient and adaptive async pre-training of language models
- autonomous data selection with zero-shot generative classifiers for mathematical | arXiv: 2402.07625
- between circuits chomsky | arXiv: 2502.19249
- chinese grammatical error correction with pre-trained models and linguistic clue
- critiq mining data quality criteria from human preferences | arXiv: 2502.19279
- data-constrained synthesis of training data for de-identification | arXiv: 2502.14677
- data caricatures on the representation of african american language in pretraini | arXiv: 2503.10789
- data whisperer data selection | arXiv: 2505.12212
- davir data selection via implicit reward for large language models | arXiv: 2310.13008
- diversity explains inference scaling laws through a case study of minimum bayes | arXiv: 2410.15021
- dual stage curriculum learning sequence labeling | arXiv: 2402.13534
- emergent abilities continued pt | arXiv: 2506.00288
- fr spec speculative sampling | arXiv: 2502.14856
- how do llms acquire new knowledge a knowledge circuits perspective on continual | arXiv: 2502.11196
- improving continual pre-training through seamless data packing | arXiv: 2505.22018
- inconsistent tokenizations cause language models to be perplexed by japanese gra | arXiv: 2505.19599
- incorporating domain knowledge into materials tokenization | arXiv: 2506.11115
- inserter speech instruction | arXiv: 2503.02769
- large vocabulary size improves large language models | arXiv: 2406.16508
- leancode understanding models better for code simplification of pre-trained larg | arXiv: 2505.14759
- making llms better many-to-many speech-to-text translators with curriculum learn | arXiv: 2409.19510
- metarater a multidimensional data selection method | arXiv: 2504.14194
- model performance-guided evaluation data selection for effective prompt optimiza | arXiv: 2505.10736
- nemotron cc pretraining data | arXiv: 2412.02595
- optimizing pre-training data mixtures with mixtures of data expert models | arXiv: 2502.15950
- pre-training curriculum for multi-token prediction in language models | arXiv: 2505.22757
- retrofitting large language models with dynamic tokenization | arXiv: 2411.18553
- scar style consistency data selection | arXiv: 2406.10882
- second language arabic acquisition of llms via progressive vocabulary expansion | arXiv: 2412.12310
- splintering nonconcatenative languages for better tokenization | arXiv: 2503.14433
- stealing training data from large language models in decentralized training thro | arXiv: 2502.16086
- synthesizing post-training data for llms through multi-agent simulation | arXiv: 2410.14251
- tokalign vocab adaptation | arXiv: 2506.03523
- tokenization is sensitive to language variation | arXiv: 2502.15343
- towards effective and efficient continual pre-training of large language models | arXiv: 2407.18743
- training dynamics underlying language model scaling laws loss deceleration and z | arXiv: 2506.05447
- unsupervised morphological tree tokenizer | arXiv: 2406.15245
- velocitune a velocity-based dynamic domain reweighting method for continual pre- | arXiv: 2411.14318
- beyond the answer advancing multi-hop qa with fine-grained graph reasoning and e
- commonsense abductive reasoning using knowledge from multiple sources
- complex reasoning with natural language contexts and background knowledge
- epicprm-efficient-precise-training-data-for-process-reward-model | arXiv: 2503.02382
- agrail a lifelong agent guardrail with effective and adaptive safety detection | arXiv: 2502.11448
- aligning large language models to follow instructions and hallucinate less via e | arXiv: 2502.07340
- alleviating hallucinations from knowledge misalignment in large language models
- answer when needed forget when not language models pretend to forget via in-cont | arXiv: 2410.00382
- are the hidden states hiding something testing the limits of factuality-encoding | arXiv: 2505.16520
- arghitz at archehr-qa 2025 a two-step divide and conquer approach to patient que | arXiv: 2506.12886
- automated explanation generation and hallucination detection for heritage image
- chinese simpleqa a chinese factuality evaluation for large language models | arXiv: 2411.07140
- cliperase efficient unlearning of visual-textual associations in clip | arXiv: 2410.23330
- comparisonqa evaluating factuality robustness of llms through knowledge frequenc | arXiv: 2412.20251
- core robust factual precision with informative sub-claim identification
- defense prompt injection | arXiv: 2411.00459
- exploring forgetting in large language model pre-training | arXiv: 2410.17018
- factual knowledge in language models robustness and anomalies under simple tempo | arXiv: 2502.01220
- faithful and robust llm-driven theorem proving for nli explanations | arXiv: 2505.24264
- from misleading queries to accurate answers a three-stage fine-tuning method for | arXiv: 2504.11277
- hallucination detox send | arXiv: 2410.15460
- halogen hallucinations
- hd-ndes neural differential equations for hallucination detection in llms | arXiv: 2506.00088
- hidden-states-factuality-encoding-limits | arXiv: 2505.16520
- how does response length affect long-form factuality | arXiv: 2505.23295
- improving factuality with explicit working memory | arXiv: 2412.18069
- improving model factuality with fine-grained critique-based evaluator | arXiv: 2410.18359
- indirect prompt injection detection | arXiv: 2502.16580
- intent hallucination eval | arXiv: 2506.06539
- language models can subtly deceive without lying a case study on strategic phras | arXiv: 2405.04325
- learning auxiliary tasks improves reference-free hallucination detection in open | arXiv: 2505.12265
- localizing and mitigating errors in long-form question answering | arXiv: 2407.11930
- mamba knockout for unraveling factual information flow | arXiv: 2505.24244
- monitoring decoding mitigating hallucination via evaluating the factuality of pa | arXiv: 2503.03106
- odysseus dynamic focus decoding | arXiv: 2503.08057
- on-policy self-alignment with fine-grained knowledge feedback for hallucination | arXiv: 2406.12221
- opt-out investigating entity-level unlearning for large language models via opti | arXiv: 2406.12329
- rate-ft-auxiliary-tasks-for-hallucination-detection | arXiv: 2505.12265
- real-time factuality assessment from adversarial feedback | arXiv: 2410.14651
- relearn unlearning via learning for large language models | arXiv: 2502.11190
- revs unlearning sensitive information in language models via rank editing in the | arXiv: 2406.09325
- saferoute adaptive model selection for efficient and accurate safety guardrails | arXiv: 2502.12464
- seuf is unlearning one expert enough for mixture-of-experts llms | arXiv: 2411.18797
- stochastic chameleons irrelevant context hallucinations reveal class-based misge | arXiv: 2505.22630
- towards context-robust llms a gated representation fine-tuning approach | arXiv: 2502.14100
- towards effective extraction and evaluation of factual claims | arXiv: 2502.10855
- treecut a synthetic unanswerable math word problem dataset for llm hallucination | arXiv: 2502.13442
- truth knows no language evaluating truthfulness beyond english | arXiv: 2502.09387
- ualign leveraging uncertainty estimations for factuality alignment on large lang | arXiv: 2412.11803
- uaqfact evaluating factual knowledge utilization of llms on unanswerable questio | arXiv: 2505.23461
- unveiling and addressing pseudo forgetting in large language models | arXiv: 2411.11932
- which retain set matters for llm unlearning a case study on entity unlearning | arXiv: 2502.11441
- zjuklab at semeval-2025 task 4 unlearning via model merging | arXiv: 2503.21088
- align-pro align protein representations through multi-modal learning
- concept bottleneck language models for protein design
- medbiorag semantic search and retrieval-augmented generation for biomedical lite
- cfsp an efficient structured pruning framework for llms with coarse-to-fine acti
- compact and compressible representations for llms using structured sparse decom
- compression in transformer language models has a surprising relationship with pe
- a case study of cross-lingual zero-shot generalization for classical languages i | arXiv: 2505.13173
- accessible machine translation evaluation for low-resource languages
- alleviating distribution shift in synthetic data for machine translation quality | arXiv: 2502.19941
- an expanded massive multilingual dataset for high-performance language technolog | arXiv: 2503.10267
- are rules meant to be broken understanding multilingual moral reasoning as a com | arXiv: 2502.14083
- askqe question answering as automatic evaluation for machine translation | arXiv: 2504.11582
- assessing agentic large language models in multilingual national bias | arXiv: 2502.17945
- beyond n-grams rethinking evaluation metrics and strategies for multilingual abs | arXiv: 2507.08342
- blessing of multilinguality a systematic analysis of multilingual in-context lea | arXiv: 2502.11364
- bridging the language gaps in large language models with inference-time cross-li | arXiv: 2410.12462
- cc-tuning a cross-lingual connection mechanism for improving joint multilingual | arXiv: 2506.00875
- cchall a novel benchmark for joint cross-lingual and cross-modal hallucinations | arXiv: 2505.19108
- clix cross-lingual explanations of idiomatic expressions | arXiv: 2501.03191
- code-switching curriculum learning for multilingual transfer in llms | arXiv: 2411.02460
- code-switching red-teaming llm evaluation for safety and multilingual understand | arXiv: 2406.15481
- comparative analysis of multilingual hate speech detection
- context augmented token-level post-editing for human interpreting
- cosmmic commentsensitive multimodal multilingual indian corpus | arXiv: 2506.15372
- cross-lingual auto evaluation for assessing multilingual llms | arXiv: 2410.13394
- cross-lingual optimization for language transfer in large language models | arXiv: 2505.14297
- cross-lingual representation alignment through contrastive image-caption tuning | arXiv: 2505.13628
- cross-lingual transfer of cultural knowledge an asymmetric phenomenon | arXiv: 2506.01675
- cross-lingual transfer of debiasing and detoxification in multilingual llms an e | arXiv: 2412.14050
- cross lingual neurons compression | arXiv: 2506.01629
- crosslingual pitfalls | arXiv: 2505.18673
- cruxeval-x a benchmark for multilingual code reasoning understanding and executi | arXiv: 2408.13001
- culfit a fine-grained cultural-aware llm training paradigm via multilingual crit | arXiv: 2505.19484
- dictionaries to the rescue cross-lingual vocabulary transfer for low-resource la | arXiv: 2506.01535
- disentangle language culture | arXiv: 2505.24635
- edit once update everywhere a simple framework for cross-lingual knowledge synch | arXiv: 2502.14645
- execute a multilingual benchmark for llm token understanding | arXiv: 2505.17784
- exploring in-context example generation for machine translation | arXiv: 2506.00507
- exploring in-image machine translation with real-world background | arXiv: 2505.15282
- flare crosslingual lora | arXiv: 2501.06892
- grammamt improving machine translation with grammar-informed in-context learning | arXiv: 2410.18702
- group then scale dynamic mixture-of-experts multilingual language model | arXiv: 2506.12388
- hierarchical news clustering | arXiv: 2506.00277
- implicit cross-lingual rewarding for efficient multilingual preference alignment | arXiv: 2503.04647
- improving mllms document image machine translation via synchronously self-review | arXiv: 2507.08309
- just go parallel improving the multilingual capabilities of large language model | arXiv: 2506.13044
- knowcoder-x boosting multilingual information extraction via code | arXiv: 2411.04794
- laca crosslingual absa | arXiv: 2508.09515
- langmark a multilingual dataset for automatic post-editing | arXiv: 2511.17153
- langsamp multilingual pretraining | arXiv: 2409.18199
- lemonade a large multilingual expert-annotated abstractive event dataset for the | arXiv: 2506.00980
- less but better efficient multilingual expansion | arXiv: 2505.22582
- lexgen domain-aware multilingual lexicon generation | arXiv: 2405.11200
- llms can achieve high-quality simultaneous machine translation as efficiently as | arXiv: 2504.09570
- lost in multilinguality dissecting cross-lingual factual inconsistency in transf | arXiv: 2504.04264
- low resource translation | arXiv: 2506.01796
- m-mad multidimensional multi-agent debate for advanced machine translation evalu | arXiv: 2412.20127
- m2rc-eval massively multilingual repository-level code completion evaluation | arXiv: 2410.21157
- m3finmeeting a multilingual multi-sector and multi-task financial meeting unders | arXiv: 2506.02510
- m rewardbench | arXiv: 2410.15522
- machine translation models are zero-shot detectors of translation direction | arXiv: 2401.06769
- marco bench multilingual if | arXiv: 2507.11882
- maxife multilingual and cross-lingual instruction following evaluation | arXiv: 2506.01776
- memorization inheritance seqkd | arXiv: 2502.01491
- mid layer crosslingual alignment | arXiv: 2502.14830
- milic-eval benchmarking multilingual llms for chinas minority languages | arXiv: 2503.01150
- modular sentence encoders | arXiv: 2407.14878
- moscar a large-scale multilingual and multimodal document-level corpus | arXiv: 2406.08707
- msqad multilingual ethical bias | arXiv: 2505.19121
- mt eval human parity | arXiv: 2506.19571
- mtvqa benchmarking multilingual text-centric visual question answering | arXiv: 2405.11985
- multi-perspective alignment for increasing naturalness in neural machine transla | arXiv: 2412.08473
- multilingual encoder knows more than you realize shared weights pretraining for | arXiv: 2502.10852
- multilingual llm english accent | arXiv: 2410.15956
- multilingual speech data quality | arXiv: 2506.17525
- nametag 3 a tool and a service for multilingualmultitagset ner | arXiv: 2506.05949
- probing llms for multilingual discourse generalization through a unified label s | arXiv: 2503.10515
- registering source tokens to target language spaces in multilingual neural machi | arXiv: 2501.02979
- semantic aware linear transfer by recycling pre-trained language models for cros | arXiv: 2505.10945
- seqpo-simt sequential policy optimization for simultaneous machine translation | arXiv: 2505.20622
- shifcon nondominant language | arXiv: 2410.19453
- sift-50m a large-scale multilingual dataset for speech instruction fine-tuning | arXiv: 2504.09081
- statement-tuning enables efficient cross-lingual generalization in encoder-only | arXiv: 2506.01592
- team ack at semeval-2025 task 2 beyond word-for-word machine translation for eng | arXiv: 2504.20451
- the esethu framework reimagining sustainable dataset governance and curation for | arXiv: 2502.15916
- the hidden space of safety understanding preference-tuned llms in multilingual c | arXiv: 2504.02708
- thor-moe hierarchical task-guided and context-responsive routing for neural mach | arXiv: 2505.14173
- towards global ai inclusivity a large-scale multilingual terminology dataset gis | arXiv: 2412.18367
- trans-zero self-play incentivizes large language models for multilingual transla | arXiv: 2504.14669
- translation and fusion improves cross-lingual information extraction | arXiv: 2305.13582
- translation robustness | arXiv: 2403.03923
- understanding in-context machine translation for low-resource languages a case s | arXiv: 2502.11862
- unveiling the power of source source-based minimum bayes risk decoding for neura | arXiv: 2406.11632
- watching the watchers exposing gender disparities in machine translation quality | arXiv: 2410.10995
- x-webagentbench a multilingual interactive web benchmark for evaluating global a | arXiv: 2505.15372
- zipa a family of efficient models for multilingual phone recognition | arXiv: 2505.23170
- answering complex geographic questions by adaptive reasoning with visual context
- chart-based reasoning transferring capabilities from llms to vlms | arXiv: 2403.12596
- cordial-multimodal-llm-coherence-relationships | arXiv: 2502.11300
- mmboundary reasoning step confidence | arXiv: 2505.23224
- visc-focus-centric-visual-chains-for-multi-image-reasoning | arXiv: 2504.20199
- vlm2-bench-visual-cue-linking | arXiv: 2502.12084
- wemath knowledge reasoning | arXiv: 2407.01284
- abstractive snippet generation
- an empirical study of iterative refinements for non-autoregressive translation
- controlling politeness in multi-turn dialogues through pre-phrase augmentation
- active llms for multi-hop question answering
- attribution methods in nlp navigating a fragmented landscape
- bilingual zero-shot stance detection
- brighter bridging the gap in human-annotated textual emotion recognition dataset
- conversational quality assessment a large-scale corpus and comprehensive study
- deja vu decoding repeated reading from eye movements | arXiv: 2502.11061
- meaning-beyond-truth-conditions-anaphora-accessibility | arXiv: 2502.14119
- variational approach mitigating entity bias relation extraction | arXiv: 2506.11381
- achieving certification-by-design through model-driven development
- adaptive feature-based low rank plus sparse decomposition for subspace clusterin
- cooperating and competing through natural language
- sightation-blv-aligned-diagram-descriptions | arXiv: 2503.13369
- a survey on proactive defense strategies against misinformation in large languag | arXiv: 2507.05288
- banstereoset a dataset to measure stereotypical social biases in llms for bangla | arXiv: 2409.11638
- beyond negative stereotypes -- non-negative abusive utterances about identity gr
- biasguard a reasoning-enhanced bias detection tool for large language models | arXiv: 2504.21299
- can community notes replace professional fact-checkers | arXiv: 2502.14132
- conspiracy theories and where to find them on tiktok | arXiv: 2407.12545
- culture matters in toxic language detection in persian | arXiv: 2506.03458
- detection of human and machine-authored fake news in urdu | arXiv: 2410.19517
- explicit vs implicit investigating social bias in large language models through | arXiv: 2501.02295
- exploring gender bias in large language models an in-depth dive into the german | arXiv: 2507.16557
- exploring multimodal challenges in toxic chinese detection taxonomy benchmark an | arXiv: 2505.24341
- exploring the impact of instruction-tuning on llms susceptibility to misinformat | arXiv: 2507.18203
- fairsteer inference time debiasing for llms with dynamic activation steering | arXiv: 2504.14492
- gg-bbq german gender bias benchmark for question answering | arXiv: 2507.16410
- hateday global hate speech | arXiv: 2411.15462
- how does misinformation affect large language | arXiv: 2505.21608
- implihatevid video hate | arXiv: 2508.06570
- is llm an overconfident judge unveiling the capabilities of llms in detecting of | arXiv: 2502.06207
- kda automated data generation pipeline for detoxifying implicitly offensive lang | arXiv: 2506.13513
- llm label propagation | arXiv: 2506.00488
- llm personalized disinformation | arXiv: 2412.13666
- mdit-bench evaluating the dual-implicit toxicity in large multimodal models | arXiv: 2505.17144
- measuring social biases in masked language models by proxy of prediction quality | arXiv: 2402.13954
- silencing empowerment allowing bigotry auditing the moderation of hate speech on | arXiv: 2506.07667
- state toxicn a benchmark for span-level target-aware toxicity extraction in chin | arXiv: 2501.15451
- taz2024full analysing german newspapers for gender bias and discrimination acros | arXiv: 2506.05388
- translate with care addressing gender bias neutrality and reasoning in large lan | arXiv: 2506.00748
- context aware sentiment forecasting agents | arXiv: 2505.24331
- q2e query-to-event decomposition for zero-shot multilingual text-to-video retrie | arXiv: 2506.10202
- vidcapbench a comprehensive benchmark of video captioning for controllable text- | arXiv: 2502.12782
- a thousand words paint a picture multimodal goal tracking for grounded social in
- attention-seeker dynamic self-attention scoring for unsupervised key-frame extra
- bold selection bias | arXiv: 2410.14248