ACL2025 论文笔记 TODO¶

总计: 2835 篇 | 已完成: 2358 | 待更新: 477

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | arXiv: 2411.02355
500xcompressor generalized prompt compression for large language models | arXiv: 2408.03094
a case study of cross-lingual zero-shot generalization for classical languages i | arXiv: 2505.13173
a comprehensive graph framework for question answering with mode-seeking prefere | arXiv: 2506.17951
a conformal risk control framework for granular word assessment and uncertainty | arXiv: 2504.01225
a drop-in solution for on-the-fly adaptation of speculative decoding in large la
a dual-mind framework for strategic and expressive negotiation agent
a dual-perspective nlg meta-evaluation framework with automatic benchmark and be | arXiv: 2502.12052
a general knowledge injection framework for icd coding | arXiv: 2505.18708
a generative adaptive replay continual learning model for temporal knowledge gra
a large and balanced corpus for fine-grained arabic readability assessment | arXiv: 2502.13520
a large-scale real-world evaluation of llm-based virtual teaching assistant | arXiv: 2506.17363
a little human data goes a long way | arXiv: 2410.13098
a measure of the system dependence of automated metrics | arXiv: 2412.03152
a mismatched benchmark for scientific natural language inference | arXiv: 2506.04603
a modular approach for clinical slms driven by synthetic data with pre-instructi
a modular dataset to demonstrate llm abstraction capability | arXiv: 2503.17645
a multi-agent framework for mitigating dialect biases in privacy policy question | arXiv: 2506.02998
A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems | arXiv: 2506.02998
a multi-persona framework for argument quality assessment
a mutual information perspective on knowledge graph embedding
a new formulation of zipfs meaning-frequency law through contextual diversity
a parameter-efficient and fine-grained prompt learning for vision-language model
a practical approach for building production-grade conversational agents with wo | arXiv: 2505.23006
a reality check on context utilisation for retrieval-augmented generation | arXiv: 2412.17031
a representation level analysis of nmt model robustness to grammatical errors | arXiv: 2505.21224
a retrieval-based approach to medical procedure matching in romanian | arXiv: 2503.20556
a rose by any other name llm-generated explanations are good proxies for human e | arXiv: 2412.13942
a self-denoising model for robust few-shot relation extraction
a semantic-aware layer-freezing approach to computation-efficient fine-tuning of | arXiv: 2406.11753
a semi-supervised scalable unified framework for e-commerce query classification | arXiv: 2506.21049
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression | arXiv: 2412.17483
a spatio-temporal point process for fine-grained modeling of reading behavior | arXiv: 2506.19999
a statistical and multi-perspective revisiting of the membership inference attac
a strategic coordination framework of small lms matches large lms in data synthe
a survey of automatic prompt optimization with instruction-focused heuristic-bas | arXiv: 2502.18746
a survey of large language models in psychotherapy current landscape and future | arXiv: 2502.11095
a survey of llm-based agents in medicine how far are we from baymax | arXiv: 2502.11211
a survey of post-training scaling in large language models
A Survey on Efficient Large Language Model Training: From Data-centric Perspectives | arXiv: 2510.25817
A Survey on Foundation Language Models for Single-cell Biology
A Survey on Patent Analysis: From NLP to Multimodal AI | arXiv: 2404.08668
a survey on proactive defense strategies against misinformation in large languag | arXiv: 2507.05288
a systematic study of compositional syntactic transformer language models | arXiv: 2506.22978
a text is worth several tokens text embedding from llms secretly aligns well wit | arXiv: 2406.17378
A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive | arXiv: 2402.11005
a training-free llm-based approach to general chinese character error correction | arXiv: 2502.15266
a triple-view framework for fine-grained emotion classification with clustering-
A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns | arXiv: 2410.16155
a unified agentic framework for evaluating conditional image generation | arXiv: 2504.07046
a variational approach for mitigating entity bias in relation extraction | arXiv: 2506.11381
a-tasc asian ted-based automatic subtitling corpus
aad-llm neural attention-driven auditory scene understanding | arXiv: 2502.16794
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research | arXiv: 2507.13300
accelerating adaptive retrieval augmented generation via instruction-driven repr | arXiv: 2505.12731
accelerating dense llms via l0-regularized mixture-of-experts
access denied inc the first benchmark environment for sensitivity awareness | arXiv: 2506.00964
accurate kv cache quantization with outlier tokens tracing | arXiv: 2505.10938
AceCoder: Acing Coder RL via Automated Test-Case Synthesis | arXiv: 2502.01718
acord an expert-annotated retrieval dataset for legal contract drafting | arXiv: 2501.06582
acoustic individual identification of white-faced capuchin monkeys using joint m
acquisition and application of novel knowledge in large language models
act knowledgeable agents to design and perform complex tasks
activating distributed visual region within llms for efficient and effective vis
activation steering decoding mitigating hallucination in large vision-language m
actiview evaluating active perception ability for multimodal large language mode
ad-hoc concept forming in the game codenames as a means for evaluating large lan | arXiv: 2502.11707
ad-llm benchmarking large language models for anomaly detection | arXiv: 2412.11142
adadhp fine-grained fine-tuning via dual hadamard product and adaptive parameter
adaedit advancing continuous knowledge editing for large language models
adammeme adaptively probe the reasoning capacity of multimodal large language mo | arXiv: 2507.01702
adaptagent adapting multimodal web agents with few-shot learning from human demo
adapting psycholinguistic research for llms gender-inclusive language in a coref | arXiv: 2502.13120
adaptive and robust translation from natural language to multi-model query langu
adaptive detoxification safeguarding general capabilities of llms through toxici | arXiv: 2505.22298
adaptive linguistic prompting alp enhances phishing webpage detection in multimo | arXiv: 2507.13357
adaptive retrieval without self-knowledge bringing uncertainty back home | arXiv: 2501.12835
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger | arXiv: 2502.12961
adaptive-vp a framework for llm-based virtual patients that adapts to trainees d | arXiv: 2506.00386
addressing blind guessing calibration of selection bias in multiple-choice quest
advancing collaborative debates with role differentiation through multi-agent re
advancing sequential numerical prediction in autoregressive models | arXiv: 2505.13077
advancing smoe for continuous domain adaptation of mllms adaptive router and dom
advancing zero-shot text-to-speech intelligibility across diverse domains via pr | arXiv: 2505.04113
adversarial alignment with anchor dragging drift a3d2 multimodal domain adaptati
adversarial tokenization | arXiv: 2503.02174
adverse event extraction from discharge summaries a new dataset annotation schem
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | arXiv: 2411.15640
afrobench how good are large language models on african languages | arXiv: 2311.07978
afrocs-xs creating a compact high-quality human-validated code-switched dataset
agd adversarial game defense against jailbreak attacks in large language models
Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents | arXiv: 2506.21252
agentalign navigating safety alignment in the shift from informative to agentic | arXiv: 2505.23020
agentdropout dynamic agent elimination for token-efficient and high-performance
agentgym evaluating and training large language model-based agents across divers
agentic knowledgeable self-awareness | arXiv: 2504.03553
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools | arXiv: 2502.04644
agentic reward modeling integrating human preferences with verifiable correctnes
agentrm enhancing agent generalization with reward modeling | arXiv: 2502.18407
agents under siege breaking pragmatic multi-agent llm systems with optimized pro
agrail a lifelong agent guardrail with effective and adaptive safety detection | arXiv: 2502.11448
agri-cm3 a chinese massive multi-modal multi-level benchmark for agricultural un
ai4reading chinese audiobook interpretation system based on multi-agent collabor | arXiv: 2512.23300
aide attribute-guided multi-hop data expansion for data scarcity in task-specifi | arXiv: 2412.06136
AIMSCheck: Leveraging LLMs for AI-Assisted Review of Modern Slavery Statements Across Jurisdictions | arXiv: 2506.01671
air-bench automated heterogeneous information retrieval benchmark | arXiv: 2412.13102
akan cinematic emotions ace a multimodal multi-party dataset for emotion recogni | arXiv: 2502.10973
algen few-shot inversion attacks on textual embeddings via cross-model alignment
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study | arXiv: 2412.13169
align-slm textless spoken language models with reinforcement learning from ai fe | arXiv: 2411.01834
AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | arXiv: 2503.02832
Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race | arXiv: 2506.00253
Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race | arXiv: 2506.00253
Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review | arXiv: 2412.18043
aligning large language models to follow instructions and hallucinate less via e | arXiv: 2502.07340
Aligning Large Language Models with Implicit Preferences from User-Generated Content | arXiv: 2506.04463
Aligning VLM Assistants with Personalized Situated Cognition | arXiv: 2506.00930
alignment drift in cefr-prompted llms for interactive spanish tutoring | arXiv: 2505.08351
alignmmbench evaluating chinese multimodal alignment in large vision-language mo | arXiv: 2406.09295
All That Glitters is Not Novel: Plagiarism in AI Generated Research | arXiv: 2502.16487
alleviating distribution shift in synthetic data for machine translation quality | arXiv: 2502.19941
alleviating hallucinations from knowledge misalignment in large language models
ambik dataset of ambiguous tasks in kitchen environment | arXiv: 2506.04089
amopo adaptive multi-objective preference optimization without reward models and | arXiv: 2506.07165
amplifying trans and nonbinary voices a community-centred harm taxonomy for llms
an analysis of datasets metrics and models in keyphrase generation | arXiv: 2506.10346
An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling | arXiv: 2402.13534
an efficient and precise training data construction framework for process-superv
An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals | arXiv: 2506.03519
an empirical study of iterative refinements for non-autoregressive translation
An Empirical Study of Many-to-Many Summarization with Large Language Models | arXiv: 2505.12983
an expanded massive multilingual dataset for high-performance language technolog | arXiv: 2503.10267
analytickws towards exemplar-free analytic class incremental learning for small- | arXiv: 2505.11817
analyzing and mitigating inconsistency in discrete speech tokens for neural code
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations | arXiv: 2504.13816
analyzing political bias in llms via target-oriented sentiment classification | arXiv: 2505.19776
analyzing the rapid generalization of sft via the perspective of attention head
anchored answers unravelling positional bias in gpt-2s multiple-choice questions | arXiv: 2405.03205
AndroidGen: Building an Android Language Agent under Data Scarcity | arXiv: 2504.19298
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | arXiv: 2410.24024
anre analogical replay for temporal knowledge graph forecasting
answer when needed forget when not language models pretend to forget via in-cont | arXiv: 2410.00382
answering complex geographic questions by adaptive reasoning with visual context
antileakbench preventing data contamination by automatically constructing benchm | arXiv: 2412.13670
any information is just worth one single screenshot unifying search with visuali
anything goes a crosslinguistic study of impossible language learning in lms | arXiv: 2502.18795
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs | arXiv: 2502.12085
appl a prompt programming language for harmonious integration of programs and la
are any-to-any models more consistent across modality transfers than specialists | arXiv: 2505.24211
are bias evaluation methods biased | arXiv: 2506.17111
are llms effective psychological assessors leveraging adaptive rag for interpret
are optimal algorithms still optimal rethinking sorting in llm-based pairwise ra
are rules meant to be broken understanding multilingual moral reasoning as a com | arXiv: 2502.14083
are the hidden states hiding something testing the limits of factuality-encoding
Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media | arXiv: 2412.18148
are your llms capable of stable reasoning | arXiv: 2412.13147
arghitz at archehr-qa 2025 a two-step divide and conquer approach to patient que | arXiv: 2506.12886
aria-ui visual grounding for gui instructions | arXiv: 2412.16256
ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search | arXiv: 2504.10893
Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework | arXiv: 2412.16953
arithmattack evaluating robustness of llms to noisy context in math problem solv | arXiv: 2501.08203
around the world in 24 hours probing llm knowledge of time and place | arXiv: 2506.03984
asclepius a spectrum evaluation benchmark for medical multi-modal large language
ask-before-detection identifying and mitigating conformity bias in llm-powered e
askqe question answering as automatic evaluation for machine translation | arXiv: 2504.11582
aspera a simulated environment to evaluate planning for complex action execution | arXiv: 2507.15501
aspo adaptive sentence-level preference optimization for fine-grained multimodal | arXiv: 2505.19100
assessing agentic large language models in multilingual national bias | arXiv: 2502.17945
Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | arXiv: 2410.11005
assessing reliability and political bias in llms judgements of formal and materi
assessment and manipulation of latent constructs in pre-trained language models
assigning distinct roles to quantized and low-rank matrices toward optimal weigh | arXiv: 2506.02077
Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | arXiv: 2410.07176
atgen a framework for active text generation | arXiv: 2506.23342
atlantis weak-to-strong learning via importance sampling
atomic calibration of llms in long-form generations | arXiv: 2410.13246
atri mitigating multilingual audio text retrieval inconsistencies by reducing da
Attacking Vision-Language Computer Agents via Pop-ups | arXiv: 2411.02391
Attention Entropy is a Key Factor for Parallel Context Encoding | arXiv: 2412.16545
attention speaks volumes localizing and mitigating bias in language models | arXiv: 2410.22517
atyaephyra at semeval-2025 task 4 low-rank negative preference optimization | arXiv: 2503.13690
autalic a dataset for anti-autistic ableist language in context | arXiv: 2410.16520
auto-arena automating llm evaluations with agent peer battles and committee disc
auto-ta towards scalable automated thematic analysis ta via multi-agent large la | arXiv: 2506.23998
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs | arXiv: 2502.01977
Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models | arXiv: 2505.19490
automated structured radiology report generation | arXiv: 2505.24223
automatic detection of dyslexia based on eye movements during reading in russian
automatic evaluation for text-to-image generation task-decomposed framework dist
automatic expert discovery in llm upcycling via sparse interpolated mixture-of-e
automatic generation of inference making questions for reading comprehension ass | arXiv: 2506.08260
automatic transmission for llm tiers optimizing cost and accuracy in large langu | arXiv: 2505.20921
Automating Legal Interpretation with LLMs: Retrieval, Generation, and Evaluation | arXiv: 2501.01743
automedeval harnessing language models for automatic medical capability evaluati
AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs | arXiv: 2506.00569
automixer checkpoint artifacts as automatic data mixers | arXiv: 2506.21910
autonomous data selection with zero-shot generative classifiers for mathematical | arXiv: 2402.07625
Autoregressive Speech Synthesis without Vector Quantization | arXiv: 2407.08551
avg-llava an efficient large multimodal model with adaptive visual granularity | arXiv: 2410.02745
awes laws and flaws from todays llm research | arXiv: 2408.15409
axis efficient human-agent-computer interaction with api-first llm-based agents | arXiv: 2409.17140
balancing diversity and risk in llm sampling how to select your method and param
balancing the budget understanding trade-offs between supervised and preference- | arXiv: 2502.11284
bandit-based prompt design strategy selection improves prompt optimizers | arXiv: 2503.01163
banstereoset a dataset to measure stereotypical social biases in llms for bangla | arXiv: 2409.11638
basic reading distillation | arXiv: 2507.19741
batayan a filipino nlp benchmark for evaluating large language models | arXiv: 2502.14911
battling against tough resister strategy planning with adversarial game for non-
BeamLoRA: Beam-Constraint Low-Rank Adaptation | arXiv: 2502.13604
behavioral analysis of information salience in large language models | arXiv: 2502.14613
behaviorbox automated discovery of fine-grained performance differences between | arXiv: 2506.02204
behavioural vs representational systematicity in end-to-end models an opinionate | arXiv: 2506.04461
Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset for Polish Erotic Discourse | arXiv: 2412.17533
bel esprit multi-agent framework for building ai model pipelines | arXiv: 2412.14684
BelarusianGLUE: Towards a Natural Language Understanding Benchmark for Belarusian
belle a bi-level multi-agent reasoning framework for multi-hop question answerin | arXiv: 2505.11811
benchmarking and improving large vision-language models for fundamental visual g
benchmarking llms and llm-based agents in practical vulnerability detection for | arXiv: 2503.03586
benchmarking long-context language models on long code understanding | arXiv: 2503.04359
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models | arXiv: 2412.05167
benchmarking uncertainty quantification methods for large language models with l | arXiv: 2406.15627
bert-like models for slavic morpheme segmentation
besstie a benchmark for sentiment and sarcasm classification for varieties of en | arXiv: 2412.04726
better embeddings with coupled adam | arXiv: 2502.08441
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases | arXiv: 2502.19249
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs | arXiv: 2502.20968
beyond completion a foundation model for general knowledge graph reasoning | arXiv: 2505.21926
beyond demographics fine-tuning large language models to predict individuals sub
beyond dialogue a profile-dialogue alignment framework towards general role-play
Beyond Facts: Evaluating Intent Hallucination in Large Language Models | arXiv: 2506.06539
Beyond Frameworks: Unpacking Collaboration Strategies in Multi-Agent Systems | arXiv: 2505.12467
beyond in-context learning aligning long-form generation of large language model | arXiv: 2506.01265
beyond logits aligning feature dynamics for effective knowledge distillation
beyond n-grams rethinking evaluation metrics and strategies for multilingual abs | arXiv: 2507.08342
beyond negative stereotypes -- non-negative abusive utterances about identity gr
beyond numeric rewards in-context dueling bandits with llm agents | arXiv: 2407.01887
beyond one-size-fits-all tailored benchmarks for efficient evaluation | arXiv: 2502.13576
beyond output matching bidirectional alignment for enhanced in-context learning | arXiv: 2312.17055
beyond position the emergence of wavelet-like properties in transformers | arXiv: 2410.18067
beyond profile from surface-level facts to deep persona simulation in llms | arXiv: 2502.12988
beyond prompt engineering robust behavior control in llms via steering target at | arXiv: 2505.20322
Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering | arXiv: 2503.01606
beyond sequences two-dimensional representation and dependency encoding for code
beyond similarity a gradient-based graph method for instruction tuning data sele
beyond single labels improving conversational recommendation through llm-powered | arXiv: 2508.05657
beyond surface simplicity revealing hidden reasoning attributes for precise comm
beyond surface-level patterns an essence-driven defense framework against jailbr | arXiv: 2502.19041
Beyond Text Compression: Evaluating Tokenizers Across Scales | arXiv: 2506.03101
beyond the answer advancing multi-hop qa with fine-grained graph reasoning and e
beyond the tip of efficiency uncovering the submerged threats of jailbreak attac | arXiv: 2502.19883
beyond true or false retrieval-augmented hierarchical analysis of nuanced claims | arXiv: 2506.10728
bfs-prover scalable best-first tree search for llm-based automatic theorem provi
bi-tuning with collaborative information for controllable llm-based sequential r
bias attribution in filipino language models extending a bias interpretability m | arXiv: 2506.07249
Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation | arXiv: 2402.12649
bias in the mirror are llms opinions robust to their own adversarial attacks
biased llms can influence political decision-making
biasguard a reasoning-enhanced bias detection tool for large language models | arXiv: 2504.21299
big-bench extra hard | arXiv: 2502.19187
big5-chat shaping llm personalities through training on human-grounded data | arXiv: 2410.16491
bilingual zero-shot stance detection
Binary Classifier Optimization for Large Language Model Alignment | arXiv: 2404.04656
bipro zero-shot chinese poem generation via block inverse prompting constrained | arXiv: 2411.13237
bitnetcpp efficient edge inference for ternary llms
blessing of multilinguality a systematic analysis of multilingual in-context lea | arXiv: 2502.11364
blockpruner fine-grained pruning for large language models | arXiv: 2406.10594
bmike-53 investigating cross-lingual knowledge editing with in-context learning | arXiv: 2406.17764
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation | arXiv: 2502.10762
BookCoref: Coreference Resolution at Book Scale | arXiv: 2507.12075
bookworld from novels to interactive agent societies for story creation
boosting llms molecular structure elucidation with knowledge enhanced tree searc | arXiv: 2506.23056
boosting long-context information seeking via query-guided activation refilling
boosting vulnerability detection of llms via curriculum preference optimization | arXiv: 2506.07390
bpp-search enhancing tree of thought reasoning for mathematical modeling problem | arXiv: 2411.17404
bqa body language question answering dataset for video large language models | arXiv: 2410.13206
brainecho semantic brain signal decoding through vector-quantized spectrogram re | arXiv: 2410.14971
breaking the ceiling exploring the potential of jailbreak attacks through expand | arXiv: 2505.21277
bregman conditional random fields sequence labeling with parallelizable inferenc | arXiv: 2506.00732
brevity is the soul of sustainability characterizing llm response lengths | arXiv: 2506.08686
bridging the language gaps in large language models with inference-time cross-li
brighter bridging the gap in human-annotated textual emotion recognition dataset
browsing like human a multimodal web agent with experiential fast-and-slow think
browsing lost unformed recollections a benchmark for tip-of-the-tongue search an | arXiv: 2503.19193
building a long text privacy policy corpus with multi-class labels
building better avoiding pitfalls in developing language resources when data is | arXiv: 2410.12691
burn after reading do multimodal large language models truly capture order of ev | arXiv: 2506.10415
bypass back-propagation optimization-based structural pruning for large language
Byte Latent Transformer: Patches Scale Better Than Tokens | arXiv: 2412.09871
c2leva toward comprehensive and contamination-free language model evaluation | arXiv: 2412.04947
cadreview automatically reviewing cad programs with error detection and correcti | arXiv: 2505.22304
calibraeval calibrating prediction distribution to mitigate selection bias in ll
call for rigor in reporting quality of instruction tuning data | arXiv: 2503.04807
CaLMQA: Exploring Culturally Specific Long-Form Question Answering across 23 Languages | arXiv: 2406.17761
cami a counselor agent supporting motivational interviewing through state infere
can a single model master both multi-turn conversations and tool use coalm a uni
can community notes replace professional fact-checkers | arXiv: 2502.14132
can external validation tools improve annotation quality for llm-as-a-judge | arXiv: 2507.17015
Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? | arXiv: 2402.07140
Can Indirect Prompt Injection Attacks Be Detected and Removed? | arXiv: 2502.16580
can input attributions explain inductive reasoning in in-context learning | arXiv: 2412.15628
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering | arXiv: 2410.08085
can language models reason about individualistic human values and preferences | arXiv: 2410.03868
can language models replace programmers for coding repocod says not yet | arXiv: 2410.21647
can large language models accurately generate answer keys for health-related que
can large language models address open-target stance detection | arXiv: 2409.00222
can large language models detect errors in long chain-of-thought reasoning | arXiv: 2502.19361
Can Large Language Models Understand Internet Buzzwords Through User-Generated Content | arXiv: 2505.15071
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? | arXiv: 2502.11598
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates | arXiv: 2505.22943
Can LLMs Evaluate Complex Attribution in QA? Automatic Benchmarking using Knowledge Graphs | arXiv: 2401.14640
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval | arXiv: 2506.12278
can llms ground when they dont know a study on direct and loaded political quest
can llms help uncover insights about llms a large-scale evolving literature anal | arXiv: 2502.18791
can llms identify critical limitations within scientific research a systematic e
can llms interpret and leverage structured linguistic representations a case stu | arXiv: 2504.04745
can llms reason about program semantics a comprehensive evaluation of llms on fo | arXiv: 2503.04779
can llms reliably simulate real students abilities in mathematics and reading co | arXiv: 2507.08232
can llms simulate l2-english dialogue an information-theoretic analysis of l1-de
can llms understand unvoiced speech exploring emg-to-text conversion with llms | arXiv: 2506.00304
can mllms understand the deep implication behind chinese images | arXiv: 2410.13854
can multimodal foundation models understand schematic diagrams an empirical stud | arXiv: 2507.10787
Can Multimodal Large Language Models Understand Spatial Relations? | arXiv: 2505.19015
can third parties read our emotions
can uniform meaning representation help gpt-4 translate from indigenous language | arXiv: 2502.08900
can vision language models understand mimed actions | arXiv: 2506.21586
can vision-language models evaluate handwritten math | arXiv: 2501.07244
can we further elicit reasoning in llms critic-guided planning with retrieval-au
can we retrieve everything all at once arm an alignment-oriented llm-based retri
can you really trust code copilot evaluating large language models from a code s
can you share your story modeling clients metacognition and openness for llm the | arXiv: 2507.19643
capability salience vector fine-grained alignment of loss and capabilities for d
capacity matters a proof-of-concept for transformer memorization on real-world d | arXiv: 2506.14704
Capture the Key in Reasoning to Enhance CoT Distillation Generalization | arXiv: 2405.19737
capturing author self beliefs in social media language
cart a generative cross-modal retrieval framework with coarse-to-fine semantic m | arXiv: 2406.17507
Causal Estimation of Tokenisation Bias | arXiv: 2506.03149
causal graph based event reasoning using semantic relation experts | arXiv: 2506.06910
causalrag integrating causal graphs into retrieval-augmented generation | arXiv: 2503.19878
Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions | arXiv: 2408.02544
cautious next token prediction | arXiv: 2507.03038
cavgan unifying jailbreak and defense of llms via generative adversarial attacks | arXiv: 2507.06043
cc-tuning a cross-lingual connection mechanism for improving joint multilingual | arXiv: 2506.00875
cchall a novel benchmark for joint cross-lingual and cross-modal hallucinations | arXiv: 2505.19108
ceaes bidirectional reinforcement learning optimization for consistent and expla
CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference | arXiv: 2412.10652
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model | arXiv: 2501.05122
CER: Confidence Enhanced Reasoning in LLMs | arXiv: 2502.14634
cfbench a comprehensive constraints-following benchmark for llms | arXiv: 2408.01122
chain-of-jailbreak attack for image generation models via editing step by step | arXiv: 2410.03869
chain-of-reasoning towards unified mathematical reasoning in large language mode | arXiv: 2501.11110
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | arXiv: 2501.11110
chain-talker chain understanding and rendering for empathetic conversational spe | arXiv: 2505.12597
ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains | arXiv: 2507.08427
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | arXiv: 2501.06598
chartlens fine-grained visual attribution in charts | arXiv: 2505.19360
chatbench from static benchmarks to human-ai evaluation | arXiv: 2504.07114
chatsop an sop-guided mcts planning framework for controllable llm dialogue agen
Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch | arXiv: 2502.17173
cheer-ekman fine-grained embodied emotion classification | arXiv: 2506.01047
chemactor enhancing automated extraction of chemical synthesis actions with llm- | arXiv: 2506.23520
CheXalign: Preference Fine-tuning in Chest X-ray Interpretation Models without Human Feedback | arXiv: 2410.07025
childmandarin a comprehensive mandarin speech dataset for young children aged 3- | arXiv: 2409.18584
chinese inertial gan for handwriting signal generation and recognition
chinese safetyqa a safety short-form factuality benchmark for large language mod
chinese simpleqa a chinese factuality evaluation for large language models | arXiv: 2411.07140
chronosense exploring temporal understanding in large language models with time | arXiv: 2501.03040
chulo chunk-level key information representation for long document understanding | arXiv: 2410.11119
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models | arXiv: 2410.01434
circuit stability characterizes language model generalization | arXiv: 2505.24731
citeeval principle-driven citation evaluation for source attribution | arXiv: 2506.01829
citynavagent aerial vision-and-language navigation with hierarchical semantic pl
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs | arXiv: 2409.05806
clac at semeval-2025 task 6 a multi-architecture approach for corporate environm | arXiv: 2505.23538
claim mitigating multilingual object hallucination in large vision-language mode
claimpkg enhancing claim verification via pseudo-subgraph generation with lightw | arXiv: 2505.22552
clamp 3 universal music information retrieval across unaligned modalities and un | arXiv: 2502.10362
CLaSp: In-Context Layer Skip for Self-Speculative Decoding | arXiv: 2505.24196
class distillation with mahalanobis contrast an efficient training paradigm for
Classifying Unreliable Narrators with Large Language Models | arXiv: 2506.10231
CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction | arXiv: 2407.00934
clinidial a naturally occurring multimodal dialogue dataset for team reflection | arXiv: 2506.12936
cliperase efficient unlearning of visual-textual associations in clip | arXiv: 2410.23330
clix cross-lingual explanations of idiomatic expressions | arXiv: 2501.03191
clozemath improving mathematical reasoning in language models by learning to fil | arXiv: 2506.03763
clusterattn kv cache compression under intrinsic attention clustering
cmhkf cross-modality heterogeneous knowledge fusion for weakly supervised video
cnnsum exploring long-context summarization with large language models in chines | arXiv: 2412.02819
CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model | arXiv: 2509.11698
coam corpus of all-type multiword expressions | arXiv: 2412.18151
coco-bench a comprehensive code benchmark for multi-task large language model ev | arXiv: 2504.20673
CoCoLex: Confidence-guided Copy-based Decoding for Grounded Legal Text Generation | arXiv: 2508.05534
code-switching and syntax a large-scale experiment | arXiv: 2506.01846
code-switching curriculum learning for multilingual transfer in llms | arXiv: 2411.02460
code-switching red-teaming llm evaluation for safety and multilingual understand | arXiv: 2406.15481
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code | arXiv: 2410.05605
codeif benchmarking the instruction-following capabilities of large language mod | arXiv: 2502.19166
codemenv benchmarking large language models on code migration | arXiv: 2506.00894
codereviewqa the code review comprehension assessment for large language models | arXiv: 2503.16167
codetool enhancing programmatic tool invocation of llms via process supervision | arXiv: 2503.20840
coe a clue of emotion framework for emotion recognition in conversations
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models | arXiv: 2505.20767
cogsteer cognition-inspired selective layer intervention for efficiently steerin | arXiv: 2410.17714
coir a comprehensive benchmark for code information retrieval models | arXiv: 2407.02883
cola collaborative low-rank adaptation | arXiv: 2505.15471
coling-unia at scivqa 2025 few-shot example retrieval and confidence-informed en | arXiv: 2507.02357
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence | arXiv: 2503.05037
colloquial singaporean english style transfer with fine-grained explainable cont
Com2: A Causal-Guided Benchmark for Complex Commonsense Reasoning | arXiv: 2506.07064
combining domain and alignment vectors provides better knowledge-safety trade-of
combining the best of both worlds a method for hybrid nmt and llm translation | arXiv: 2505.13554
comet metaphor-driven covert communication for multi-agent language games | arXiv: 2505.18218
comfyui-copilot an intelligent assistant for automated workflow development | arXiv: 2506.05010
Commonsense Reasoning in Arab Culture | arXiv: 2502.12788
Comparing LLM-generated and human-authored news text using formal syntactic theory | arXiv: 2506.01407
Comparing Moral Values in Western English-speaking Societies and LLMs with Word Associations | arXiv: 2505.19674
comparison-based active preference learning for multi-dimensional personalizatio
comparisonqa evaluating factuality robustness of llms through knowledge frequenc | arXiv: 2412.20251
compileagent automated real-world repo-level compilation with tool-integrated ll | arXiv: 2505.04254
compke complex question answering under knowledge editing | arXiv: 2506.00829
Completing A Systematic Review in Hours instead of Months with Interactive AI Agents | arXiv: 2504.14822
computation mechanism behind llm position generalization | arXiv: 2503.13305
comrag retrieval-augmented generation with dynamic vector stores for real-time c | arXiv: 2506.21098
con instruction universal jailbreaking of multimodal large language models via n | arXiv: 2506.00548
conceptcarve dynamic realization of evidence | arXiv: 2504.07228
conditional dichotomy quantification via geometric embedding
condor enhance llm alignment with knowledge-driven data synthesis and refinement | arXiv: 2501.12273
conect dataset overcoming data scarcity in context-aware e-commerce mt | arXiv: 2506.04929
confetti conversational function-calling evaluation through turn-level interacti | arXiv: 2506.01859
confidence vs critique a decomposition of self-correction capability for llms
conformity in large language models | arXiv: 2410.12428
conloan a contrastive multilingual dataset for evaluating loanwords
consim measuring concept-based explanations effectiveness with automated simulat | arXiv: 2501.05855
ConsistencyChecker: Tree-based Evaluation of LLM Generalization Capabilities | arXiv: 2506.12376
consistent client simulation for motivational interviewing-based counseling | arXiv: 2502.02802
conspiracy theories and where to find them on tiktok | arXiv: 2407.12545
consultant decoding yet another synergistic mechanism | arXiv: 2506.02391
context-aware hierarchical merging for long document summarization | arXiv: 2502.00977
Context-Aware Sentiment Forecasting via LLM-based Multi-Perspective Role-Playing Agents | arXiv: 2505.24331
context-robust knowledge editing for language models | arXiv: 2505.23026
contextual experience replay for self-improvement of language agents | arXiv: 2506.06698
Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing | arXiv: 2505.20976
Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models | arXiv: 2401.08491
Contrastive Prompting Enhances Sentence Embeddings in LLMs through Inference-Time Steering | arXiv: 2505.12831
Controllable and Reliable Knowledge-Intensive Task-Oriented Conversational Agents with Declarative Genie Worksheets | arXiv: 2407.05674
controllable style arithmetic with language models
controlled low-rank adaptation with subspace regularization for continued traini
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control | arXiv: 2406.01205
convert language model into a value-based strategic planner | arXiv: 2505.06987
cool-fusion fuse large language models without training | arXiv: 2407.19807
cooperative or competitive understanding the interaction between attention heads
coordinating chaos a structured review of linguistic coordination methodologies
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | arXiv: 2502.16880
cordial can multimodal large language models effectively understand coherence re
CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG | arXiv: 2506.02544
coreeval automatically building contamination-resilient datasets with real-world
coreference as an indicator of context scope in multimodal narrative | arXiv: 2503.05298
coret improved retriever for code editing | arXiv: 2505.24715
correcting hallucinations in news summaries exploration of self-correcting llm m | arXiv: 2506.19607
cortexdebate debating sparsely and equally for multi-agent debate | arXiv: 2507.03928
cosmic generalized refusal direction identification in llm activations | arXiv: 2506.00085
COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation | arXiv: 2506.15372
cosyn code guided synthetic data
cot-based synthesizer enhancing llm performance through answer synthesis | arXiv: 2501.01668
cot-icl lab a synthetic framework for studying chain-of-thought learning from in | arXiv: 2502.15132
cot-uq improving response-wise uncertainty quantification in llms with chain-of- | arXiv: 2502.17214
cot-valve length-compressible chain-of-thought tuning | arXiv: 2502.09601
counterfactual-consistency prompting for relative temporal understanding in larg | arXiv: 2502.11425
Counterspeech the Ultimate Shield! Multi-Conditioned Counterspeech Generation through Attributed Prefix Learning | arXiv: 2505.11958
cove compressed vocabulary expansion makes better llm-based recommender systems | arXiv: 2506.19993
crab a novel configurable role-playing llm with assessing benchmark
cracking factual knowledge a comprehensive analysis of degenerate knowledge neur
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence | arXiv: 2412.13949
craftext benchmark advancing instruction following in complex multimodal open-en
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity | arXiv: 2502.13063
crisists coupling social media textual data and meteorological time series for u
criskeval a chinese multi-level risk evaluation benchmark dataset for large lang
critic-cot boosting the reasoning abilities of large language model via chain-of | arXiv: 2408.16326
critiq mining data quality criteria from human preferences | arXiv: 2502.19279
Croppable Knowledge Graph Embedding | arXiv: 2407.02779
cross-document contextual coreference resolution in knowledge graphs | arXiv: 2504.05767
cross-lingual auto evaluation for assessing multilingual llms | arXiv: 2410.13394
Cross-Lingual Generalization and Compression: From Language-Specific to Shared Neurons | arXiv: 2506.01629
cross-lingual optimization for language transfer in large language models | arXiv: 2505.14297
Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models | arXiv: 2505.18673
cross-lingual representation alignment through contrastive image-caption tuning | arXiv: 2505.13628
cross-lingual transfer of cultural knowledge an asymmetric phenomenon | arXiv: 2506.01675
cross-lingual transfer of debiasing and detoxification in multilingual llms an e | arXiv: 2412.14050
Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts | arXiv: 2501.02009
crowd comparative reasoning unlocking comprehensive evaluations for llm-as-a-jud
crowdsource crawl or generate creating sea-vl a multicultural vision-language da
cruxeval-x a benchmark for multilingual code reasoning understanding and executi | arXiv: 2408.13001
cstree-sri introspection-driven cognitive semantic tree for multi-turn question
cstrl context-driven sequential transfer learning for abstractive radiology repo | arXiv: 2503.05750
ctpd cross-modal temporal pattern discovery for enhanced multimodal electronic h | arXiv: 2411.00696
cu-mam coherence-driven unified macro-structures for argument mining
cuckoo an ie free rider hatched by massive nutrition in llms nest | arXiv: 2502.11275
culemo cultural lenses on emotion - benchmarking llms for cross-cultural emotion | arXiv: 2503.10688
culfit a fine-grained cultural-aware llm training paradigm via multilingual crit | arXiv: 2505.19484
cultivating gaming sense for yourself making vlms gaming experts | arXiv: 2503.21263
cultural learning-based culture adaptation of language models | arXiv: 2504.02953
culturalbench a robust diverse and challenging benchmark for measuring lms cultu
culturalbench a robust diverse and challenging cultural benchmark by human-ai cu | arXiv: 2410.02677
culture is not trivia sociocultural theory for cultural nlp | arXiv: 2502.12057
culture matters in toxic language detection in persian | arXiv: 2506.03458
Curiosity-Driven Reinforcement Learning from Human Feedback | arXiv: 2501.11463
curriculum debiasing toward robust parameter-efficient fine-tuning against datas
cxggec construction-guided grammatical error correction
cypherbench towards precise retrieval over full-scale modern knowledge graphs in
d-gen automatic distractor generation and evaluation for reliable assessment of | arXiv: 2504.13439
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression | arXiv: 2507.11942
dalr dual-level alignment learning for multimodal sentence representation learni | arXiv: 2506.21096
dape v2 process attention score as feature map for length extrapolation | arXiv: 2410.04798
dars dynamic action re-sampling to enhance coding agent performance by adaptive | arXiv: 2503.14269
data caricatures on the representation of african american language in pretraini | arXiv: 2503.10789
data laundering artificially boosting benchmark results through knowledge distil | arXiv: 2412.15255
Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning | arXiv: 2506.17525
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning | arXiv: 2505.12212
data-constrained synthesis of training data for de-identification | arXiv: 2502.14677
davir data selection via implicit reward for large language models | arXiv: 2310.13008
dcg-sql enhancing in-context learning for text-to-sql with deep contextual schem
ddxtutor clinical reasoning tutoring system with differential diagnosis-based st
DeAL: Decoding-time Alignment for Large Language Models | arXiv: 2402.06147
debate reflect and distill multi-agent feedback with tree-structured preference | arXiv: 2506.03541
debatecoder towards collective intelligence of llms via test case driven llm deb
debiasing the fine-grained classification task in llms with bias-aware peft
decoder-only llms can be masked auto-encoders
decoding by contrasting knowledge enhancing large language model confidence on e
decoding knowledge attribution in mixture-of-experts a framework of basic-refine | arXiv: 2505.24593
decoding on graphs faithful and sound reasoning on knowledge graphs through gene
decoding reading goals from eye movements | arXiv: 2410.20779
decomposed opinion summarization with verified aspect-aware modules | arXiv: 2501.17191
deep temporal reasoning in video language models a cross-linguistic evaluation o
deeper insight into your user directed persona refinement for dynamic persona mo
deepreview improving llm-based paper review with human-like deep thinking proces
deeprtl2 a versatile model for rtl-related tasks | arXiv: 2506.15697
deepsolution boosting complex engineering solution design via tree-based explora | arXiv: 2502.20730
def-dts deductive reasoning for open-domain dialogue topic segmentation | arXiv: 2505.21033
Defense Against Prompt Injection Attack by Leveraging Attack Techniques | arXiv: 2411.00459
define decision-making with analogical reasoning over factor profiles | arXiv: 2410.01772
defining and evaluating visual language models basic spatial abilities a perspec
Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems | arXiv: 2502.14019
deja vu decoding repeated reading from eye movements
deliberate reasoning in language models as structure-aware planning with an accu
delta-knn improving demonstration selection in in-context learning for alzheimer
Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models | arXiv: 2505.19121
demo reframing dialogue interaction with fine-grained element modeling | arXiv: 2412.04905
demons in the detail on implementing load balancing loss for training specialize
demystifying small language models for edge deployment
denselora dense low-rank adaptation of large language models | arXiv: 2505.23808
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models | arXiv: 2506.11068
design choices for extending the context length of visual language models | arXiv: 2412.12735
detecting referring expressions in visually grounded dialogue with autoregressiv | arXiv: 2506.21294
detecting sockpuppetry on wikipedia using meta-learning | arXiv: 2506.10314
detection of human and machine-authored fake news in urdu | arXiv: 2410.19517
developmentally-plausible working memory shapes a critical period for language a | arXiv: 2502.04795
dialectal coverage and generalization in arabic speech recognition | arXiv: 2411.05872
dialogue systems for emotional support via value reinforcement | arXiv: 2501.17182
dialogue-rag enhancing retrieval for llms via node-linking utterance rewriting
dialup modeling the language continuum by adapting models to dialects and dialec
dice-bench evaluating the tool-use capabilities of large language models in mult | arXiv: 2506.22853
dictionaries to the rescue cross-lingual vocabulary transfer for low-resource la | arXiv: 2506.01535
Did Translation Models Get More Robust Without Anyone Even Noticing? | arXiv: 2403.03923
different speech translation models encode and translate speaker gender differen | arXiv: 2506.02172
difflm controllable synthetic data generation via diffusion language models | arXiv: 2411.03250
DiffPO: Diffusion Alignment with Direct Preference Optimization | arXiv: 2503.04240
DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising | arXiv: 2407.00248
diffusion directed acyclic transformer for non-autoregressive machine translatio
diffusion models through a global lens are they culturally inclusive
digest the knowledge large language models empowered message passing for knowled
digital gatekeepers googles role in curating hashtags and subreddits | arXiv: 2506.14370
dior adaptive cognitive detection and contextual retrieval optimization for dyna
direct behavior optimization unlocking the potential of lightweight llms | arXiv: 2506.06401
direct confidence alignment aligning verbalized confidence with internal confide | arXiv: 2512.11998
direct prompt optimization with continuous representations
disambiguate first parse later generating interpretations for ambiguity resoluti | arXiv: 2502.18448
disambiguating reference in visually grounded dialogues through joint modeling o
disc plug-and-play decoding intervention with similarity of characters for chine
disco device-server collaborative llm-based text streaming services | arXiv: 2502.11417
discourse relation-enhanced neural coherence modeling
disentangled multi-span evolutionary network against temporal knowledge graph re | arXiv: 2505.14020
disentangling biased knowledge from reasoning in large language models via machi
Disentangling Language and Culture for Evaluating Multilingual Large Language Models | arXiv: 2505.24635
Disentangling Memory and Reasoning Ability in Large Language Models | arXiv: 2411.13504
disentangling the roles of representation and selection in data pruning | arXiv: 2507.03648
distance between relevant information pieces causes bias in long-context llms | arXiv: 2410.14641
distilling an end-to-end voice assistant without instruction training data | arXiv: 2410.02678
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts | arXiv: 2506.09351
diversity explains inference scaling laws through a case study of minimum bayes | arXiv: 2410.15021
Diversity-oriented Data Augmentation with Large Language Models | arXiv: 2502.11671
Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation | arXiv: 2501.12432
Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG | arXiv: 2505.20871
dnaspeech a contextualized and situated text-to-speech dataset with dialogues na
dncasr end-to-end training for speaker-attributed asr | arXiv: 2506.01916
do emotions really affect argument convincingness a dynamic approach with llm-ba | arXiv: 2503.00024
do language models have semantics on the five standard positions
do language models mirror human confidence exploring psychological insights to a | arXiv: 2506.00582
do language models understand honorific systems in javanese | arXiv: 2502.20864
do language models understand the cognitive tasks given to them investigations w | arXiv: 2412.18120
Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs | arXiv: 2410.15956
do large language models perform latent multi-hop reasoning without exploiting s | arXiv: 2411.16679
do llms give psychometrically plausible responses in educational assessments | arXiv: 2506.09796
do llms understand dialogues a case study on dialogue acts
do multimodal large language models truly see what we point at investigating ind
do not abstain identify and solve the uncertainty | arXiv: 2506.00780
do vision-language models have internal world models towards an atomic evaluatio | arXiv: 2506.21876
doc-react multi-page heterogeneous document question-answering
docagent a multi-agent system for automated code documentation generation | arXiv: 2504.08725
docmedit towards document-level model editing | arXiv: 2505.19572
document-level event-argument data augmentation for challenging role types
Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport | arXiv: 2505.23078
does context matter contextualjudgebench for evaluating llm-based judges in cont
does the emotional understanding of lvlms vary under high-stress environments an
does time have its place temporal heads where language models recall time-specif | arXiv: 2502.14258
does your voice assistant remember analyzing conversational context recall and u | arXiv: 2502.19759
dolphin document image parsing via heterogeneous anchor prompting | arXiv: 2505.14059
dolphin moving towards closed-loop auto-research through thinking practice and f | arXiv: 2501.03916
DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning | arXiv: 2507.02302
Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts | arXiv: 2505.24427
dont erase inform detecting and contextualizing harmful language in cultural her
dont get lost in the trees streamlining llm reasoning by overcoming tree search
dont half-listen capturing key-part information in continual instruction tuning
dont miss the forest for the trees attentional vision calibration for large visi | arXiv: 2405.17820
dont reinvent the wheel efficient instruction-following text embedding based on | arXiv: 2505.24754
dont say no jailbreaking llm by suppressing refusal | arXiv: 2404.16369
double entendre robust audio-based ai-generated lyrics detection via multi-view | arXiv: 2506.15981
drae dynamic retrieval-augmented expert networks for lifelong learning and task | arXiv: 2507.04661
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination | arXiv: 2506.01954
drama diverse augmentation from large language models to smaller dense retriever | arXiv: 2502.18460
DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing | arXiv: 2402.16733
drift enhancing llm faithfulness in rationale generation via dual-reward probabi
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization | arXiv: 2411.14055
drs deep question reformulation with structured output | arXiv: 2411.17993
drt deep reasoning translation via long chain-of-thought | arXiv: 2412.17498
ds2-absa dual-stream data synthesis with label refinement for few-shot aspect-ba
dtcrs dynamic tree construction for recursive summarization | arXiv: 2604.07012
dualguard a parameter space transformation approach for bidirectional defense in
dually self-improved counterfactual data augmentation using large language model
dualrag a dual-process approach to integrate reasoning and retrieval for multi-h
dva validate your demonstration first before you use it
dynacode a dynamic complexity-aware code benchmark for evaluating large language | arXiv: 2503.10452
Dynamic and Generalizable Process Reward Modeling | arXiv: 2507.17849
dynamic chunking and selection for reading comprehension of ultra-long context i | arXiv: 2506.00773
dynamic evaluation with cognitive reasoning for multi-turn safety of large langu
dynamic head selection for neural lexicalized constituency parsing
dynamic knowledge integration for evidence-driven counter-argument generation wi | arXiv: 2503.05328
dynamic label name refinement for few-shot dialogue intent classification | arXiv: 2412.15603
Dynamic Order Template Prediction for Generative Aspect-Based Sentiment Analysis | arXiv: 2406.11130
dynamic parallel tree search for efficient llm reasoning | arXiv: 2502.16235
dynamic scaling of unit tests for code reward modeling | arXiv: 2501.01054
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models | arXiv: 2508.01625
eagle expert-guided self-enhancement for preference alignment in pathology large
ecerc evidence-cause attention network for multi-modal emotion recognition in co
ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of Intent | arXiv: 2403.04481
EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association | arXiv: 2505.15196
edit once update everywhere a simple framework for cross-lingual knowledge synch | arXiv: 2502.14645
EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models | arXiv: 2502.19765
editinspector a benchmark for evaluation of text-guided image edits | arXiv: 2506.09988
educationq evaluating llms teaching capabilities through multi-agent dialogue fr
educators perceptions of large language models as tutors comparing human and ai | arXiv: 2506.08702
Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection | arXiv: 2411.07446
efficient domain continual pretraining by mitigating the stability gap
efficient ensemble for fine-tuning language models on multiple datasets | arXiv: 2505.21930
Efficient Knowledge Editing via Minimal Precomputation | arXiv: 2506.04226
efficient long context language model retrieval with compression | arXiv: 2412.18232
efficient many-shot in-context learning with dynamic block-sparse attention | arXiv: 2503.08640
efficient opamp adaptation for zoom attention to golden contexts | arXiv: 2502.12502
efficient pretraining data selection for language models via multi-actor collabo
efficient safety alignment of large language models via preference re-ranking an
Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization | arXiv: 2405.14189
Efficiently Identifying Watermarked Segments in Mixed-Source Texts | arXiv: 2410.03600
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | arXiv: 2407.11062
effivlm bench vlm acceleration
EffiVLM-Bench: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models | arXiv: 2506.00479
ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming | arXiv: 2505.16667
elba-bench an efficient learning backdoor attacks benchmark for large language m | arXiv: 2502.18511
eli-why evaluating the pedagogical utility of language model explanations | arXiv: 2506.14200
embedding-converter a unified framework for cross-model embedding transformation
Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents | arXiv: 2505.19997
embracing large language models in traffic flow forecasting | arXiv: 2412.12201
Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation | arXiv: 2506.00288
emma-x an embodied multimodal action model with grounded chain of thought and lo
empaths at semeval-2025 task 11 retrieval-augmented approach to perceived emotio | arXiv: 2506.04409
empathy prediction from diverse perspectives
employing discourse coherence enhancement to improve cross-document event and en
emulate a multi-agent framework for determining the veracity of atomic claims by | arXiv: 2505.16576
enabling chatbots with eyes and ears an immersive multimodal conversation system | arXiv: 2506.00421
enabling llm knowledge analysis via extensive materialization | arXiv: 2411.04920
end-to-end dialog neural coreference resolution balancing efficiency and accurac | arXiv: 2504.05824
energy considerations of large language model inference and efficiency optimizat
english-based acoustic models perform well in the forced alignment of two englis
enhance multimodal consistency and coherence for text-image plan generation | arXiv: 2506.11380
Enhancing Automated Interpretability with Output-Centric Feature Descriptions | arXiv: 2501.08319
enhancing chain-of-thought reasoning with critical representation fine-tuning | arXiv: 2507.10085
Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning | arXiv: 2411.17679
enhancing conversational agents with theory of mind aligning beliefs desires and | arXiv: 2502.14171
enhancing cross-lingual transfer through reversible transliteration a huffman-ba
enhancing event-centric news cluster summarization via data sharpening and local
enhancing goal-oriented proactive dialogue systems via consistency reflection an | arXiv: 2506.13366
enhancing human evaluation in machine translation with comparative judgement
Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion Knowledge | arXiv: 2506.15504
enhancing input-label mapping in in-context learning with contrastive decoding | arXiv: 2502.13738
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models | arXiv: 2506.01334
enhancing lexicon-based text embeddings with large language models | arXiv: 2501.09749
enhancing llm agent safety via causal influence prompting | arXiv: 2507.00979
enhancing machine translation with self-supervised preference data
enhancing marker scoring accuracy through ordinal confidence modelling in educat | arXiv: 2505.23315
enhancing mathematical reasoning in llms by stepwise correction | arXiv: 2410.12934
enhancing medical dialogue generation through knowledge refinement and dynamic p | arXiv: 2506.10877
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA | arXiv: 2506.02041
enhancing multimodal retrieval via complementary information extraction and alig
enhancing ner by harnessing multiple datasets with conditional variational autoe
enhancing neural machine translation through target language data a knn-lm appro
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub | arXiv: 2312.17294
enhancing retrieval systems with inference-time logical reasoning | arXiv: 2503.17860
enhancing retrieval-augmented generation via evidence tree search
Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization | arXiv: 2507.10923
enhancing spoken discourse modeling in language models using gestural cues | arXiv: 2503.03474
enhancing text editing for grammatical error correction arabic as a case study | arXiv: 2503.00985
enhancing the comprehensibility of text explanations via unsupervised concept di | arXiv: 2505.20293
enhancing transformation from natural language to signal temporal logic using ll | arXiv: 2505.20658
Enhancing Transformers for Generalizable First-Order Logical Entailment | arXiv: 2501.00759
enhancing unsupervised sentence embeddings via knowledge-driven data augmentatio
enigmatom improve llms theory-of-mind reasoning capabilities with neural knowled | arXiv: 2503.03340
Enough Coin Flips Can Make LLMs Act Bayesian | arXiv: 2503.04722
Ensemble Watermarks for Large Language Models | arXiv: 2411.19563
enstom enhancing dialogue systems with entropy-scaled steering vectors for topic | arXiv: 2505.16526
entailed between the lines incorporating implication into nli | arXiv: 2501.07719
entailment-preserving first-order logic representations in natural language enta | arXiv: 2502.16757
entity framing and role portrayal in the news | arXiv: 2502.14718
entropy-based exploration conduction for multi-step reasoning | arXiv: 2503.15848
entropy-uid a method for optimizing information density | arXiv: 2502.14366
epicode boosting model performance beyond training with extrapolation and contra | arXiv: 2506.03489
epman episodic memory attention for generalizing to longer contexts | arXiv: 2502.14280
epo explicit policy optimization for strategic reasoning in llms via reinforceme
error comparison optimization for large language models on aspect-based sentimen
error-driven data-efficient large multimodal model tuning | arXiv: 2412.15652
eru-kg efficient reference-aligned unsupervised keyphrase generation | arXiv: 2505.24219
EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents | arXiv: 2412.13549
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis | arXiv: 2506.04142
estimating privacy leakage of augmented contextual knowledge in language models | arXiv: 2410.03026
eta-wavlm efficient speaker identity removal in self-supervised speech represent | arXiv: 2505.19273
etf an entity tracing framework for hallucination detection in code summaries | arXiv: 2410.14748
evaluating design decisions for dual encoder-based entity disambiguation | arXiv: 2505.11683
evaluating implicit bias in large language models by attacking from a psychometr | arXiv: 2406.14023
Evaluating Language Models as Synthetic Data Generators | arXiv: 2412.03679
evaluating lexical proficiency in neural language models
evaluating llms for portuguese sentence simplification with linguistic insights
evaluating multimodal language models as visual assistants for visually impaired | arXiv: 2503.22610
Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search | arXiv: 2506.11155
evaluating personalized tool-augmented llms from the perspectives of personaliza
evaluating sequence labeling on the basis of information theory
evaluating the evaluation of diversity in commonsense generation | arXiv: 2506.00514
evaluating theory of an uncertain mind predicting the uncertain beliefs of other
evaluating visual and cultural interpretation the k-viscuit benchmark with human | arXiv: 2406.16469
evaluation agent efficient and promptable evaluation framework for visual genera
evaluation of attribution bias in generator-aware retrieval-augmented large lang | arXiv: 2410.12380
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation | arXiv: 2412.13666
evaluation of llms in medical text summarization the role of vocabulary adaptati | arXiv: 2505.21242
eventrag enhancing llm generation with event knowledge graphs
evolvebench a comprehensive benchmark for assessing temporal awareness in llms o
evowiki evaluating llms on evolving knowledge | arXiv: 2412.13582
Ewe: Improving Factuality with Explicit Working Memory | arXiv: 2412.18069
exclusion of thought mitigating cognitive load in large language models for enha
execute a multilingual benchmark for llm token understanding | arXiv: 2505.17784
exit context-aware extractive compression for enhancing retrieval-augmented gene | arXiv: 2412.12559
expectation confirmation preference optimization for multi-turn conversational r | arXiv: 2506.14302
expert an explainable image captioning evaluation metric with structured explana | arXiv: 2506.24016
expetrans llms are experiential transfer learners | arXiv: 2505.23191
explain-then-process using grammar prompting to enhance grammatical acceptabilit | arXiv: 2506.02302
explaining matters leveraging definitions and semantic expansion for sexism dete | arXiv: 2506.06238
explaining puzzle solutions in natural language an exploratory study on 6x6 sudo | arXiv: 2505.15993
explica evaluating explicit causal reasoning in large language models | arXiv: 2502.15487
explicit and implicit data augmentation for social event detection | arXiv: 2509.04202
explicit vs implicit investigating social bias in large language models through | arXiv: 2501.02295
exploiting contextual knowledge in llms through mathcalv-usable information base
exploiting the shadows unveiling privacy leaks through lower-ranked tokens in la
exploracoder advancing code generation for multiple unseen apis via planning and | arXiv: 2412.05366
explorer scaling exploration-driven web trajectory synthesis for multimodal web | arXiv: 2502.11357
exploring compositional generalization of multimodal llms for medical imaging | arXiv: 2412.20070
exploring explanations improves the robustness of in-context learning | arXiv: 2506.02378
exploring forgetting in large language model pre-training | arXiv: 2410.17018
exploring gender bias in large language models an in-depth dive into the german | arXiv: 2507.16557
exploring graph representations of logical forms for language modeling | arXiv: 2505.14523
Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder | arXiv: 2411.05195
exploring in-context example generation for machine translation | arXiv: 2506.00507
exploring in-image machine translation with real-world background | arXiv: 2505.15282
exploring llms ability to spontaneously and conditionally modify moral expressio
exploring multimodal challenges in toxic chinese detection taxonomy benchmark an | arXiv: 2505.24341
exploring multimodal relation extraction of hierarchical tabular data with multi
Exploring Persona Sentiment Sensitivity in Personalized Dialogue Generation | arXiv: 2502.11423
exploring the impact of instruction-tuning on llms susceptibility to misinformat | arXiv: 2507.18203
exposing numeracy gaps a benchmark to evaluate fundamental numerical abilities i | arXiv: 2502.11075
exposing the achilles heel evaluating llms ability to handle mistakes in mathema
Extending Complex Logical Queries on Uncertain Knowledge Graphs | arXiv: 2403.01508
Extending LLM Context Window with Adaptive Grouped Positional Encoding: A Training-Free Method
f5-tts a fairytaler that fakes fluent and faithful speech with flow matching
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models | arXiv: 2502.17924
factbench a dynamic benchmark for in-the-wild language model factuality evaluati
factual knowledge in language models robustness and anomalies under simple tempo | arXiv: 2502.01220
fairi tales evaluation of fairness in indian contexts with a focus on bias and s | arXiv: 2506.23111
fairness beyond performance revealing reliability disparities across groups in l
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs | arXiv: 2502.01926
fairsteer inference time debiasing for llms with dynamic activation steering | arXiv: 2504.14492
faithful and robust llm-driven theorem proving for nli explanations | arXiv: 2505.24264
FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation | arXiv: 2506.08938
fast or slow integrating fast intuition and deliberate thinking for enhancing vi
fast-and-frugal text-graph transformers are effective link predictors | arXiv: 2408.06778
fastdraft how to train your draft | arXiv: 2411.11055
faster speculative decoding via effective draft decoder with pruned candidate tr
fastmcts a simple sampling strategy for data synthesis | arXiv: 2502.11476
fcmr robust evaluation of financial cross-modal multi-hop reasoning | arXiv: 2412.12567
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation | arXiv: 2503.06680
feat a preference feedback dataset through a cost-effective auto-generation and | arXiv: 2506.19325
federated data-efficient instruction tuning for large language models | arXiv: 2410.10926
FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Large Language Models | arXiv: 2410.09432
fidelis faithful reasoning in large language model for knowledge graph question | arXiv: 2405.13873
fiha autonomous hallucination evaluation in vision-language models with davidson | arXiv: 2409.13612
filter-and-refine a mllm based cascade system for industrial-scale video content | arXiv: 2507.17204
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging | arXiv: 2506.05828
Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for Chatbots | arXiv: 2501.03441
finding needles in images can multi-modal llms locate fine details | arXiv: 2508.05053
finding the sweet spot preference data construction for scaling preference optim | arXiv: 2502.16825
fine-grained video dubbing duration alignment with segment supervised preference | arXiv: 2508.08550
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs | arXiv: 2407.03181
finereason evaluating and improving llms deliberate reasoning through reflective | arXiv: 2502.20238
finite state automata inside transformers with chain-of-thought a mechanistic st
finmme benchmark dataset for financial multi-modal reasoning evaluation | arXiv: 2505.24714
fitcf a framework for automatic feature importance-guided counterfactual example | arXiv: 2501.00777
fixing distribution shifts of llm self-critique via on-policy self-play training
flagevalmm a flexible framework for comprehensive multimodal model evaluation | arXiv: 2506.09081
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | arXiv: 2410.12266
flashbackefficient retrieval-augmented language modeling for long context infere | arXiv: 2405.04065
flexora flexible low-rank adaptation for large language models
flexrag a flexible and comprehensive framework for retrieval-augmented generatio | arXiv: 2506.12494
Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching | arXiv: 2507.05617
floorplan-llama aligning architects feedback and domain knowledge in architectur
focalpo enhancing preference optimizing by focusing on correct preference rankin | arXiv: 2501.06645
focus evaluating pre-trained vision-language models on underspecification reason
focus on what matters enhancing medical vision-language models with automatic at
focused-dpo enhancing code generation through focused preference optimization on | arXiv: 2502.11475
focusllm precise understanding of long context by dynamic condensing | arXiv: 2408.11745
foldmoe efficient long sequence moe training via attention-moe pipelining
follow-up question generation for enhanced patient-provider conversations | arXiv: 2503.17509
foodtaxo generating food taxonomies with large language models | arXiv: 2505.19838
forward knows efficient backward path saliency-guided memory-efficient fine-tuni
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | arXiv: 2502.14856
fractal fine-grained scoring from aggregate text labels | arXiv: 2404.04817
Frictional Agent Alignment Framework: Slow Down and Don't Break Things | arXiv: 2505.19428
from ambiguity to accuracy the transformative effect of coreference resolution o | arXiv: 2507.07847
from benign import toxic jailbreaking the language model via adversarial metapho
from citations to criticality predicting legal decision influence in the multili
from data to knowledge evaluating how efficiently language models learn facts | arXiv: 2506.16912
from english to second language mastery enhancing llms with cross-lingual contin
from human reading to nlm understanding evaluating the role of eye-tracking data
from informal to formal -- incorporating and evaluating llms on natural language
from information to insight leveraging llms for open aspect-based educational su
from isolates to families using neural networks for automated language affiliati
from lists to emojis how format bias affects model alignment | arXiv: 2409.11704
from misleading queries to accurate answers a three-stage fine-tuning method for | arXiv: 2504.11277
from neurons to semantics evaluating cross-linguistic alignment capabilities of
from objectives to questions a planning-based framework for educational mathemat
from outcomes to processes guiding prm learning from orm for inference-time alig
from perceptions to decisions wildfire evacuation decision prediction with behav
from real to synthetic synthesizing millions of diversified and complicated user | arXiv: 2506.03968
from selection to generation a survey
from selection to generation a survey of llm-based active learning | arXiv: 2502.11767
from sub-ability diagnosis to human-aligned generation bridging the gap for text
from teacher to student tracking memorization through model distillation | arXiv: 2506.16170
from tools to teammates evaluating llms in multi-session coding interactions | arXiv: 2502.13791
From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models | arXiv: 2505.09924
fusing highly specialized language models for comprehensive expertise
g-safeguard a topology-guided security lens and treatment on llm-based multi-age
g2s a general-to-specific learning framework for temporal knowledge graph foreca | arXiv: 2506.00445
ga-s3 comprehensive social network simulation with group agents | arXiv: 2506.03532
GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis | arXiv: 2505.18710
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | arXiv: 2409.04183
game development as human-llm interaction | arXiv: 2408.09386
gamebot transparent assessment of llm reasoning in games | arXiv: 2412.13602
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization | arXiv: 2503.20194
garage a benchmark with grounding annotations for rag evaluation | arXiv: 2506.07671
gear generation augmented retrieval | arXiv: 2501.02772
gear graph-enhanced agent for retrieval-augmented generation | arXiv: 2412.18431
gec-metrics a unified library for grammatical error correction evaluation | arXiv: 2505.19388
gellmtextthreesuperioro generalizing large language models for multi-property mo
Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework | arXiv: 2506.15568
genderalign an alignment dataset for mitigating gender bias in large language mo
generalized attention flow feature attribution for transformer models via maximu
generate first then sample enhancing fake news detection with llm-augmented rein
generating diverse training samples for relation extraction with large language | arXiv: 2505.23108
generating pedagogically meaningful visuals for math word problems a new benchma | arXiv: 2506.03735
Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction | arXiv: 2501.13125
generating synthetic relational tabular data via structural causal models | arXiv: 2507.03528
generative frame sampler for long video understanding | arXiv: 2503.09146
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models | arXiv: 2502.02444
generative reward modeling via synthetic criteria preference learning
genetic instruct scaling up synthetic generation of coding instructions for larg | arXiv: 2407.21077
genius a generalizable and purely unsupervised self-training framework for advan
genknowsub improving modularity and reusability of llms through general knowledg | arXiv: 2505.10939
genre a french gender-neutral rewriting system using collective nouns | arXiv: 2505.23630
Geometric Signatures of Compositionality Across a Language Model's Lifetime | arXiv: 2410.01444
getreason enhancing image context extraction through hierarchical multi-agent re
gg-bbq german gender bias benchmark for question answering | arXiv: 2507.16410
gift-sw gaussian noise injected fine-tuning of salient weights for llms | arXiv: 2408.15300
GiFT: Gibbs Fine-Tuning for Code Generation | arXiv: 2502.11466
gigachat family efficient russian language modeling through mixture of experts a | arXiv: 2506.09440
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages | arXiv: 2406.11546
global eye breaking the fixed thinking pattern during the instruction expansion
global mmlu understanding and addressing cultural and linguistic biases in multi
godbench a benchmark for multimodal large language models in video comment art | arXiv: 2505.11436
GORP: Continual Gradient Low-Rank Projection Fine-Tuning for LLMs | arXiv: 2507.02503
gpt-4 as a homework tutor can improve student engagement and learning outcomes | arXiv: 2409.15981
grace a granular benchmark for evaluating model calibration against human calibr | arXiv: 2502.19684
Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models | arXiv: 2507.01915
GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models | arXiv: 2507.04455
graf graph retrieval augmented by facts for romanian legal multi-choice question | arXiv: 2412.04119
GRAM: Generative Recommendation via Semantic-aware Multi-granular Late Fusion | arXiv: 2506.01673
grammamt improving machine translation with grammar-informed in-context learning | arXiv: 2410.18702
grampa subword regularisation by skewing uniform segmentation distributions with
Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning | arXiv: 2506.03939
Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs | arXiv: 2410.11001
graph-guided cross-composition feature disentanglement for compositional zero-sh | arXiv: 2408.09786
graph-structured trajectory extraction from travelogues | arXiv: 2410.16633
graphcheck breaking long-term text barriers with extracted knowledge graph-power | arXiv: 2502.16514
graphically speaking unmasking abuse in social media with conversation insights | arXiv: 2504.01902
graphinsight unlocking insights in large language models for graph structure und
GraphNarrator: Generating Textual Explanations for Graph Neural Networks | arXiv: 2410.15268
grat guiding retrieval-augmented reasoning through process rewards tree search
grounded or a good guesser a per-question balanced dataset to separate blind fro
group then scale dynamic mixture-of-experts multilingual language model | arXiv: 2506.12388
Growing Through Experience: Scaling Episodic Grounding in Language Models | arXiv: 2506.01312
gsq-tuning group-shared exponents integer in fully quantized training for llms o | arXiv: 2502.12913
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning | arXiv: 2505.22661
gui agents a survey | arXiv: 2412.13501
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent | arXiv: 2505.16827
guicourse from general vision language model to versatile gui agent | arXiv: 2406.11317
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents | arXiv: 2505.11368
guidelines for fine-grained sentence-level arabic readability annotation | arXiv: 2410.08674
guiding not forcing enhancing the transferability of jailbreaking attacks on llm
Gumbel Reranking: Differentiable End-to-End Reranker Optimization | arXiv: 2502.11116
gödel agent a self-referential agent framework for recursive self-improvement | arXiv: 2410.04444
haco-det a study towards fine-grained machine-generated text detection under hum
haf-rm a hybrid alignment framework for reward model training | arXiv: 2407.04185
haic improving human action understanding and generation with better captions fo
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training | arXiv: 2410.15460
hallulens llm hallucination benchmark | arXiv: 2504.17550
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | arXiv: 2501.08292
hanging in the balance pivotal moments in crisis counseling conversations | arXiv: 2506.03941
hard negative mining for domain-specific retrieval in enterprise systems | arXiv: 2505.18366
harnessing pdf data for improving japanese large multimodal models | arXiv: 2502.14778
Has Machine Translation Evaluation Achieved Human Parity? | arXiv: 2506.19571
hash-rag bridging deep hashing with retriever for efficient fine retrieval and a | arXiv: 2505.16133
hata trainable and hardware-efficient hash-aware top-k attention for scalable la | arXiv: 2506.02572
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter | arXiv: 2411.15462
have we designed generalizable structural knowledge promptings systematic evalua
hd-ndes neural differential equations for hallucination detection in llms | arXiv: 2506.00088
health-llm personalized retrieval-augmented disease prediction system | arXiv: 2402.00746
helios harmonizing early fusion late fusion and llm reasoning for multi-granular | arXiv: 2603.02248
hellaswag-pro a large-scale bilingual benchmark for evaluating the robustness of | arXiv: 2502.11393
Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing Feedback | arXiv: 2507.16007
helpsteer3 human-annotated feedback and edit data to empower inference-time scal | arXiv: 2503.04378
hft half fine-tuning for large language models | arXiv: 2404.18466
hiagent hierarchical working memory management for solving long-horizon agent ta
HiCUPID: Exploring the Potential of LLMs as Personalized Assistants | arXiv: 2506.01262
hidden in plain sight evaluation of the deception detection capabilities of llms | arXiv: 2506.09424
hiddendetect detecting jailbreak attacks against multimodal large language model | arXiv: 2502.14744
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model | arXiv: 2503.12941
hierarchical attention generates better proofs | arXiv: 2504.19188
Hierarchical Bracketing Encodings for Dependency Parsing as Tagging | arXiv: 2505.11693
hierarchical document refinement for long-context retrieval-augmented generation | arXiv: 2505.10413
Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings | arXiv: 2506.00277
Hierarchical Memory Organization for Wikipedia Generation | arXiv: 2506.23393
hierarchical retrieval with evidence curation for open-domain financial question | arXiv: 2505.20368
hierarchical safety realignment lightweight restoration of safety in pruned larg | arXiv: 2505.16104
hierarchical-task-aware multi-modal mixture of incremental lora experts for embo | arXiv: 2506.04595
hintsoftruth a multimodal checkworthiness detection dataset with real and synthe
hoh a dynamic benchmark for evaluating the impact of outdated information on ret | arXiv: 2503.04800
homebench evaluating llms in smart homes with valid and invalid instructions acr | arXiv: 2505.19628
hope a novel positional encoding without long-term decay for enhanced context aw
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval | arXiv: 2506.07296
how do llms acquire new knowledge a knowledge circuits perspective on continual | arXiv: 2502.11196
How does Misinformation Affect Large Language Model Behaviors and Preferences? | arXiv: 2505.21608
how does response length affect long-form factuality | arXiv: 2505.23295
how far are llms from being our digital twins a benchmark for persona-based beha | arXiv: 2502.14642
How Humans and LLMs Organize Conceptual Knowledge: Exploring Subordinate Categories in Italian | arXiv: 2505.21301
how llms comprehend temporal meaning in narratives a case study in cognitive eva
how much do encoder models know about word senses
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs | arXiv: 2410.13857
how to compare things properly a study of argument relevance in comparative ques
How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond | arXiv: 2501.05714
how to mitigate overfitting in weak-to-strong generalization | arXiv: 2503.04249
How to Train Long-Context Language Models (Effectively) | arXiv: 2410.02660
hpss heuristic prompting strategy search for llm evaluators | arXiv: 2502.13031
hscr hierarchical self-contrastive rewarding for aligning medical vision languag | arXiv: 2506.00805
human alignment how much do we adapt to llms
humt dumt measuring and controlling human-like language in llms | arXiv: 2502.13259
HybGRAG: Hybrid Retrieval-Augmented Generation on Textual and Relational Knowledge Bases | arXiv: 2412.16311
hybrid preferences learning to route instances for human vs ai feedback | arXiv: 2410.19133
hygenar an llm-driven hybrid genetic algorithm for few-shot grammar generation | arXiv: 2505.16978
hykge a hypothesis knowledge graph enhanced rag framework for accurate and relia
hyperfm fact-centric multimodal fusion for link prediction over hyper-relational
hypothetical documents or knowledge leakage rethinking llm-based query expansion | arXiv: 2504.14175
i see what you mean co-speech gestures for reference resolution in multimodal di | arXiv: 2503.00071
i0t embedding standardization method towards zero modality gap | arXiv: 2412.14384
iagent llm agent as a shield between user and recommender systems | arXiv: 2502.14662
iam efficient inference through attention mapping between different-scale llms | arXiv: 2507.11953
icr probe tracking hidden state dynamics for reliable hallucination detection in | arXiv: 2507.16488
idea enhancing the rule learning ability of large language model agent through i | arXiv: 2408.10455
identifying cellular niches in spatial transcriptomics an investigation into the
identifying open challenges in language identification
Identifying Reliable Evaluation Metrics for Scientific Text Revision | arXiv: 2506.04772
if attention serves as a cognitive
if attention serves as a cognitive model of human memory retrieval what is the p | arXiv: 2502.11469
if eleanor rigby had met chatgpt a study on loneliness in a post-llm world | arXiv: 2412.01617
imol incomplete-modality-tolerant learning for multi-domain fake news video dete
impara-ged grammatical error detection is boosting reference-free grammatical er | arXiv: 2506.02899
impart importance-aware delta-sparsification for improved model compression and
impartial multi-task representation learning via variance-invariant probabilisti
implicit cross-lingual rewarding for efficient multilingual preference alignment | arXiv: 2503.04647
implicit reasoning in transformers is reasoning through shortcuts | arXiv: 2503.07604
ImpliHateVid: Implicit Hate Speech Detection in Videos | arXiv: 2508.06570
improve language model and brain alignment via associative memory | arXiv: 2505.13844
improve rule retrieval and reasoning with self-induction and relevance reestimat | arXiv: 2505.10870
improve safety training of large language models with safety-critical singular v
Improve Vision Language Model Chain-of-thought Reasoning | arXiv: 2410.16198
Improved Unbiased Watermark for Large Language Models | arXiv: 2502.11268
Improving Automatic Evaluation of LLMs in Biomedical Relation Extraction via LLMs-as-the-Judge | arXiv: 2506.00777
improving chain-of-thought reasoning via quasi-symbolic abstractions | arXiv: 2502.12616
improving contextual faithfulness of large language models via retrieval heads-i
improving continual pre-training through seamless data packing | arXiv: 2505.22018
improving dialogue discourse parsing through discourse-aware utterance clarifica
improving dialogue state tracking through combinatorial search for in-context ex | arXiv: 2506.00622
improving fairness of large language models in multi-document summarization | arXiv: 2506.07479
Improving Language and Modality Transfer in Translation by Character-level Modeling | arXiv: 2505.24561
improving low-resource morphological inflection via self-supervised objectives | arXiv: 2506.05227
improving medical large vision-language models with abnormal-aware feedback | arXiv: 2501.01377
improving mllms document image machine translation via synchronously self-review | arXiv: 2507.08309
improving model factuality with fine-grained critique-based evaluator | arXiv: 2410.18359
improving parallel sentence mining for low-resource and endangered languages
improving preference extraction in llms by identifying latent knowledge through | arXiv: 2503.17755
Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics | arXiv: 2506.00637
in prospect and retrospect reflective memory management for long-term personaliz | arXiv: 2503.08026
in the llm era word sense induction remains unsolved | arXiv: 2603.11686
In-the-wild Audio Spatialization with Flexible Text-guided Localization | arXiv: 2506.00927
incongruity-aware tension field network for multi-modal sarcasm detection
inconsistent tokenizations cause language models to be perplexed by japanese gra
incorporating domain knowledge into materials tokenization | arXiv: 2506.11115
indicsynth a large-scale multilingual synthetic speech dataset for low-resource
inducing lexicons of in-group language with socio-temporal context | arXiv: 2409.19257
inductionbench llms fail in the simplest complexity class | arXiv: 2502.15823
inews a multimodal dataset for modeling personalized affective responses to news | arXiv: 2503.03335
Inference Compute-Optimal Video Vision Language Models | arXiv: 2505.18855
inferring from logits exploring best practices for decoding-free generative cand
inferring functionality of attention heads from their parameters | arXiv: 2412.11965
infinisst simultaneous translation of unbounded speech with large language model | arXiv: 2503.02969
influences on llm calibration a study of response agreement loss functions and p | arXiv: 2501.03991
infogen generating complex statistical infographics from documents | arXiv: 2507.20046
information extraction from visually rich documents using llm-based organization
information locality as an inductive bias for neural language models | arXiv: 2506.05136
injongo a multicultural intent detection and slot-filling dataset for 16 african | arXiv: 2502.09814
inner thinking transformer leveraging dynamic depth scaling to foster adaptive i | arXiv: 2502.13842
innovative image fraud detection with cross-sample anomaly analysis the power of
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training | arXiv: 2503.02769
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | arXiv: 2410.08145
InspireDebate: Multi-Dimensional Evaluation-Guided Reasoning for Debating | arXiv: 2506.18102
instance-selection-inspired undersampling strategies for bias reduction in small
instruction tuning on public government and cultural data for low-resource langu
instruction-tuning data synthesis from scratch via web reconstruction | arXiv: 2504.15573
instructpart task-oriented part segmentation with instruction reasoning | arXiv: 2505.18291
integrating audio visual and semantic information for enhanced multimodal speake
inter-passage verification for multi-evidence multi-answer qa | arXiv: 2506.00425
interact enabling interactive question-driven learning in large language models | arXiv: 2412.11388
interactive and expressive code-augmented planning with large language models | arXiv: 2411.13826
interactive evolution a neural-symbolic self-training framework for large langua
interlocking-free selective rationalization through genetic-based learning | arXiv: 2412.10312
internal and external impacts of natural language processing papers | arXiv: 2505.16061
internal value alignment in large language models through controlled value vecto | arXiv: 2507.11316
internlm-xcomposer25-reward a simple yet effective multi-modal reward model | arXiv: 2501.12368
InterpoLL: Mitigating Shortcut Learning with InterpoLated Learning | arXiv: 2507.05527
interpret and improve in-context learning via the lens of input-label mappings
introducing graph context into language models through parameter-efficient fine-
introducing verification task of set consistency with set-consistency energy net
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process | arXiv: 2405.11870
investalign overcoming data scarcity in aligning large language models with inve
investigating and enhancing the robustness of large multimodal models against te
investigating and enhancing vision-audio capability in omnimodal large language | arXiv: 2503.00059
investigating and extending homans social exchange theory with large language mo
investigating context-faithfulness in large language models the roles of memory | arXiv: 2409.10955
investigating language preference of multilingual rag systems | arXiv: 2502.11175
investigating the robustness of retrieval-augmented generation at the query leve | arXiv: 2507.06956
investorbench a benchmark for financial decision-making tasks with llm-based age
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization | arXiv: 2411.06208
ipo your language model is secretly a preference classifier | arXiv: 2502.16182
iquest an iterative question-guided framework for knowledge base question answer | arXiv: 2506.01784
iris interactive research ideation system for accelerating scientific discovery | arXiv: 2504.16728
iris interpretable retrieval-augmented classification for long interspersed docu
IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery | arXiv: 2510.09217
Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training | arXiv: 2502.12734
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory | arXiv: 2506.01048
is it just semantics a case study of discourse particle understanding in llms | arXiv: 2506.04534
is linguistically-motivated data augmentation worth it | arXiv: 2506.03593
is llm an overconfident judge unveiling the capabilities of llms in detecting of | arXiv: 2502.06207
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering | arXiv: 2502.13962
isr self-refining referring expressions for entity grounding
its not a walk in the park challenges of idiom translation in speech-to-text sys | arXiv: 2506.02995
its not bragging if you can back it up can llms understand braggings
jailbreak large vision-language models through multi-modal linkage | arXiv: 2412.00473
jailbreaking one step is enough | arXiv: 2412.12621
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs | arXiv: 2402.05668
jarvis-vla post-training large-scale vision language models to play visual games | arXiv: 2503.16365
jopa explaining large language models generation via joint prompt attribution | arXiv: 2405.20404
jsontuning towards generalizable robust and controllable instruction tuning | arXiv: 2310.02953
judging the judges can large vision-language models fairly evaluate chart compre | arXiv: 2505.08468
just a scratch enhancing llm capabilities for self-harm detection through intent | arXiv: 2506.05073
just go parallel improving the multilingual capabilities of large language model | arXiv: 2506.13044
JuStRank: Benchmarking LLM Judges for System Ranking | arXiv: 2412.09569
katfishnet detecting llm-generated korean text through linguistic feature analys | arXiv: 2503.00032
kazmmlu evaluating language models on kazakh russian and regional knowledge of k | arXiv: 2502.12829
kda automated data generation pipeline for detoxifying implicitly offensive lang | arXiv: 2506.13513
kerl knowledge-enhanced personalized recipe recommendation using large language | arXiv: 2505.14629
kg-agent an efficient autonomous agent framework for complex reasoning over know
kirag knowledge-driven iterative retriever for enhancing retrieval-augmented gen
kitab-bench a comprehensive multi-domain benchmark for arabic ocr and document u | arXiv: 2502.14949
knockout llm assessment using large language models for evaluations through iter | arXiv: 2506.03785
know you first and be you better modeling human-like user simulators via implici | arXiv: 2502.18968
know your mistakes towards preventing overreliance on task-oriented conversation | arXiv: 2501.10316
knowcoder-x boosting multilingual information extraction via code | arXiv: 2411.04794
Knowledge Boundary of Large Language Models: A Survey | arXiv: 2412.12472
knowledge decoupling via orthogonal projection for lifelong editing of large lan
Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation | arXiv: 2501.02226
knowledge image matters improving knowledge-based visual reasoning with multi-im
knowledge tracing in programming education integrating students questions | arXiv: 2502.10408
knowledge-augmented multimodal clinical rationale generation for disease diagnos
KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education? | arXiv: 2412.08985
kodcode a diverse challenging and verifiable synthetic dataset for coding | arXiv: 2503.02951
KoGEM: Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean | arXiv: 2506.01237
KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors | arXiv: 2506.01357
kristeva close reading as a novel task for benchmarking interpretive reasoning | arXiv: 2505.09825
KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding | arXiv: 2507.11273
L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models | arXiv: 2402.04902
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America | arXiv: 2507.00999
LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data Augmentation | arXiv: 2508.09515
lacuna inc at semeval-2025 task 4 lora-enhanced influence-based unlearning for l | arXiv: 2506.04044
ladder language-driven slice discovery and error rectification in vision classif | arXiv: 2408.07832
LADM: Long-context Training Data Selection with Attention-based Dependency Measurement | arXiv: 2503.02502
lamb a training-free method to enhance the long-context understanding of ssms vi
langmark a multilingual dataset for automatic post-editing | arXiv: 2511.17153
LangSAMP: Language-Script Aware Multilingual Pretraining | arXiv: 2409.18199
language complexity measurement as a noisy zero-shot proxy for evaluating llm pe | arXiv: 2502.11578
language constrained multimodal hyper adapter for many-to-many multimodal summar
Language Fusion for Parameter-Efficient Cross-lingual Transfer (FLARE) | arXiv: 2501.06892
language model fine-tuning on scaled survey data for predicting distributions of | arXiv: 2502.16761
language model probabilities are not calibrated in numeric contexts | arXiv: 2410.16007
language models can subtly deceive without lying a case study on strategic phras | arXiv: 2405.04325
language models grow less humanlike beyond phase transition | arXiv: 2502.18802
Language Models Resist Alignment: Evidence From Data Compression | arXiv: 2406.06144
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More | arXiv: 2503.10542
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models | arXiv: 2402.12208
LAQuer: Localized Attribution Queries in Content-grounded Generation | arXiv: 2506.01187
large language and protein assistant for protein-protein interactions prediction
large language and reasoning models are shallow disjunctive reasoners | arXiv: 2503.23487
large language models are good relational learners | arXiv: 2506.05725
large language models for predictive analysis how far are they | arXiv: 2505.17149
large language models in bioinformatics a survey | arXiv: 2503.04490
large language models struggle to describe the haystack without human help a soc
large margin representation learning for robust cross-lingual named entity recog
large vocabulary size improves large language models | arXiv: 2406.16508
latim measuring latent token-to-token interactions in mamba models | arXiv: 2502.15612
LazyReview: A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews | arXiv: 2504.11042
ldir low-dimensional dense and interpretable text embeddings with relative repre | arXiv: 2505.10354
leancode understanding models better for code simplification of pre-trained larg | arXiv: 2505.14759
learn to memorize scalable continual learning in semiparametric models with mixt
learning auxiliary tasks improves reference-free hallucination detection in open
learning first-order logic rules for argumentation mining
learning from litigation graphs and llms for retrieval and reasoning in ediscove | arXiv: 2405.19164
learning from negative samples in biomedical generative entity linking | arXiv: 2408.16493
learning to align multi-faceted evaluation a unified and robust framework | arXiv: 2502.18874
learning to generate structured output with schema reinforcement learning | arXiv: 2502.18878
learning to look at the other side a semantic probing study of word embeddings i
learning to reason from feedback at test-time | arXiv: 2502.15771
Learning to Reason Over Time: Timeline Self-Reflection for Temporal Reasoning | arXiv: 2504.05258
learning to rewrite generalized llm-generated text detection | arXiv: 2408.04237
learning together to perform better teaching small-scale llms to collaborate via
led-merging mitigating safety-utility conflicts in model merging with location-e
legalagentbench evaluating llm agents in legal domain | arXiv: 2412.17259
legalreasoner step-wised verification-correction for legal judgment reasoning | arXiv: 2506.07443
lemonade a large multilingual expert-annotated abstractive event dataset for the | arXiv: 2506.00980
length controlled generation for black-box llms | arXiv: 2412.14656
length-induced embedding collapse in plm-based models | arXiv: 2410.24200
lesa learnable llm layer scaling-up | arXiv: 2502.13794
less for more enhanced feedback-aligned mixed llms for molecule caption generati
less is more explainable and efficient icd code prediction with clinical entitie
less mature is more adaptable for sentence-level language modeling
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts | arXiv: 2505.22582
lets-c leveraging text embedding for time series classification | arXiv: 2407.06533
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | arXiv: 2502.11882
leveraging human production-interpretation asymmetries to test llm cognitive pla | arXiv: 2503.17579
leveraging in-context learning for political bias testing of llms | arXiv: 2506.22232
leveraging large language models to measure gender representation bias in gender | arXiv: 2406.13677
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs | arXiv: 2506.05629
leveraging unit language guidance to advance speech modeling in textless speech- | arXiv: 2505.15333
leveraging variation theory in counterfactual data augmentation for optimized ac | arXiv: 2408.03819
lexclipr cross-lingual paragraph retrieval from legal judgments
lexgen domain-aware multilingual lexicon generation | arXiv: 2405.11200
lexical diversity-aware relevance assessment for retrieval-augmented generation
lexical recall or logical reasoning probing the limits of reasoning abilities in
lexkeyplan planning with keyphrases and retrieval augmentation for legal text ge
lextempus enhancing temporal generalizability of legal language models through d
library-like behavior in language models is enhanced by self-referencing causal | arXiv: 2501.13491
lifbench evaluating the instruction following performance and stability of large
limited generalizability in argument mining state-of-the-art models learn datase | arXiv: 2505.22137
limited-resource adapters are regularizers not linguists | arXiv: 2505.24525
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning | arXiv: 2502.17407
literary evidence retrieval via long-context language models | arXiv: 2506.03090
Literature Meets Data: A Synergistic Approach to Hypothesis Generation | arXiv: 2410.17309
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs | arXiv: 2505.09338
llama-omni 2 llm-based real-time spoken chatbot with autoregressive streaming sp
llamaduo llmops pipeline for seamless migration from service llms to small-scale | arXiv: 2408.13467
llamas have feelings too unveiling sentiment and emotion representations in llam
llase-g1 incentivizing generalization capability for llama-based speech enhancem
llava steering visual instruction tuning with 500x fewer parameters through moda
llm agents making agent tools | arXiv: 2502.11705
LLM as a Broken Telephone: Iterative Generation Distorts Information | arXiv: 2502.20258
llm as effective streaming processor bridging streaming-batch mismatches with gr | arXiv: 2505.16983
llm as entity disambiguator for biomedical entity-linking
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates | arXiv: 2503.16334
llm meets scene graph can large language models understand and generate scene gr | arXiv: 2505.19510
llm-based rumor detection via influence guided sample selection and game-based p
llm-enhanced self-evolving reinforcement learning for multi-step e-commerce paym | arXiv: 2509.18719
llm-guided semantic-aware clustering for topic modeling
LLM-Powered Test Case Generation for Detecting Bugs in Plausible Programs | arXiv: 2404.10304
llms can achieve high-quality simultaneous machine translation as efficiently as | arXiv: 2504.09570
llms can be easily confused by instructional distractions | arXiv: 2502.04362
LLMs can Perform Multi-Dimensional Analytic Writing Assessments | arXiv: 2502.11368
LLMs Can Simulate Standardized Patients via Agent Coevolution | arXiv: 2412.11716
llms caught in the crossfire malware requests and jailbreak challenges | arXiv: 2506.10022
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks | arXiv: 2406.18403
llms know their vulnerabilities uncover safety gaps through natural distribution | arXiv: 2410.10700
llms persona-plug personalized llms | arXiv: 2409.11901
llms syntactically adapt their language use to their conversational partner
llms trust humans more thats a problem unveiling and mitigating the authority bi
llmsrxllm25 less is more enhancing structured multi-agent reasoning via quality- | arXiv: 2504.16408
llmtimesmapreduce simplified long-sequence processing using large language model
locagent graph-guided llm agents for code localization | arXiv: 2503.09089
local look-ahead guidance via verifier-in-the-loop for automated theorem proving | arXiv: 2503.09730
localizing and mitigating errors in long-form question answering | arXiv: 2407.11930
Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models | arXiv: 2507.18263
logic-regularized verifier elicits reasoning from llms
logical consistency is vital neural-symbolic information retrieval for negative- | arXiv: 2505.22299
logical forms complement probability in understanding language model and human p | arXiv: 2502.09589
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning | arXiv: 2409.12929
logicqa logical anomaly detection with vision language model generated questions | arXiv: 2503.20252
LoGU: Long-form Generation with Uncertainty Expressions | arXiv: 2410.14309
longbench v2 towards deeper understanding and reasoning on realistic long-contex | arXiv: 2412.15204
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating | arXiv: 2412.18424
longdpo unlock better long-form generation abilities for llms via critique-augme | arXiv: 2502.02095
longrecipe recipe for efficient long context generalization in large language mo
longred mitigating short-text degradation of long-context large language models | arXiv: 2502.07365
longreward improving long-context large language models with ai feedback | arXiv: 2410.21252
longsafety evaluating long-context safety of large language models | arXiv: 2502.16971
look both ways and no sink converting llms into text encoders without training
lost in literalism how supervised training shapes translationese in llms | arXiv: 2503.04369
lost in multilinguality dissecting cross-lingual factual inconsistency in transf | arXiv: 2504.04264
lost in the context insufficient and distracted attention to contexts in prefere
lotus a leaderboard for detailed image captioning from quality to societal bias | arXiv: 2507.19362
low-bit quantization favors undertrained llms
low-perplexity llm-generated sequences and where to find them | arXiv: 2507.01844
low-rank interconnected adaptation across layers | arXiv: 2407.09946
lpoi listwise preference optimization for vision language models | arXiv: 2505.21061
lr2bench evaluating long-chain reflective reasoning capabilities of large langua | arXiv: 2502.17848
LSSF: Safety Alignment via Low-Rank Safety Subspace Fusion | arXiv: 2602.00038
m-mad multidimensional multi-agent debate for advanced machine translation evalu | arXiv: 2412.20127
M-RewardBench: Evaluating Reward Models in Multilingual Settings | arXiv: 2410.15522
m2rc-eval massively multilingual repository-level code completion evaluation | arXiv: 2410.21157
M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs | arXiv: 2503.04856
m3finmeeting a multilingual multi-sector and multi-task financial meeting unders | arXiv: 2506.02510
m3hg multimodal multi-scale and multi-type node heterogeneous graph for emotion | arXiv: 2508.18740
machine translation models are zero-shot detectors of translation direction | arXiv: 2401.06769
macp minimal yet mighty adaptation via hierarchical cosine projection | arXiv: 2410.09103
madakv adaptive modality-perception kv cache eviction for efficient multimodal l | arXiv: 2506.15724
magic-vqa multimodal and grounded inference with commonsense knowledge for visua | arXiv: 2503.18491
magnet augmenting generative decoders with representation learning and infilling | arXiv: 2501.08648
magnet multi-turn tool-use data synthesis and distillation via graph translation | arXiv: 2503.07826
main-rag multi-agent filtering retrieval-augmented generation | arXiv: 2501.00332
make imagination clearer stable diffusion-based visual imagination for multimoda
making fetch happen finding emergent dog whistles through common habitats | arXiv: 2412.12072
making llms better many-to-many speech-to-text translators with curriculum learn | arXiv: 2409.19510
mam modular multi-agent framework for multi-modal medical diagnosis via role-spe | arXiv: 2506.19835
mamba knockout for unraveling factual information flow | arXiv: 2505.24244
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | arXiv: 2412.05237
Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | arXiv: 2410.09403
maple enhancing review generation with multi-aspect prompt learning in explainab
mapmake schema guided text to table generation | arXiv: 2505.23174
mapnav a novel memory representation via annotated semantic maps for vlm-based v
maporl multi-agent post-co-training for collaborative large language models with | arXiv: 2502.18439
Mapping 1,000+ Language Models via the Log-Likelihood Vector | arXiv: 2502.16173
mapping the podcast ecosystem with the structured podcast research corpus | arXiv: 2411.07892
mapqator an extensible framework for efficient annotation of map-based qa datase | arXiv: 2412.21015
MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment | arXiv: 2503.01711
Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models | arXiv: 2507.11882
marco-o1 v2 towards widening the distillation bottleneck for reasoning models | arXiv: 2503.01461
mars benchmarking the metaphysical reasoning abilities of language models with a | arXiv: 2406.02106
masking in multi-hop qa an analysis of how language models perform with context | arXiv: 2505.11754
masks can be learned as an alternative to experts
masrouter learning to route llms for multi-agent systems | arXiv: 2502.11133
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes | arXiv: 2410.16930
mathcala3 automatic alignment framework for attributed text generation
mathcoder-vl bridging vision and code for enhanced multimodal mathematical reaso | arXiv: 2505.10557
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion | arXiv: 2503.16212
maxife multilingual and cross-lingual instruction following evaluation | arXiv: 2506.01776
maximal matching matters preventing representation collapse for robust cross-mod | arXiv: 2506.21538
maximizing the effectiveness of larger bert models for compression
mcbe a multi-task chinese bias evaluation benchmark for large language models | arXiv: 2507.02088
mcs-bench a comprehensive benchmark for evaluating multimodal large language mod
mdbench a synthetic multi-document reasoning benchmark generated with knowledge | arXiv: 2506.14927
mdcure a scalable pipeline for multi-document instruction-following | arXiv: 2410.23463
mdit-bench evaluating the dual-implicit toxicity in large multimodal models | arXiv: 2505.17144
meaning beyond truth conditions evaluating discourse level understanding via ana
meaning variation and data quality in the corpus of founding era american englis
measuring data diversity for instruction tuning a systematic analysis and a reli | arXiv: 2502.17184
measuring social biases in masked language models by proxy of prediction quality | arXiv: 2402.13954
measuring the effect of transcription noise on downstream language understanding | arXiv: 2502.13645
mechanistic interpretability of emotion inference in large language models | arXiv: 2502.05489
medbiorag semantic search and retrieval-augmented generation with large language | arXiv: 2512.10996
meddxagent a unified modular agent framework for explainable automatic different | arXiv: 2502.19175
medical graph rag evidence-based medical large language model via graph retrieva
megapairs massive data synthesis for universal multimodal retrieval | arXiv: 2412.14475
megen generative backdoor into large language models via model editing | arXiv: 2408.10722
meit multimodal electrocardiogram instruction tuning on large language models fo | arXiv: 2403.04945
membench towards more comprehensive evaluation on the memory of llm-based agents | arXiv: 2506.21605
memeqa holistic evaluation for meme understanding
memerag a multilingual end-to-end meta-evaluation benchmark for retrieval augmen | arXiv: 2502.17163
memorization a close look at books | arXiv: 2504.12549
Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation | arXiv: 2502.01491
memorizing is not enough deep knowledge injection through reasoning | arXiv: 2504.00472
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models | arXiv: 2506.12551
merge hijacking backdoor attacks to model merging of large language models | arXiv: 2505.23561
MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models | arXiv: 2410.08604
meta-learning neural mechanisms rather than bayesian priors | arXiv: 2503.16048
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models | arXiv: 2504.14194
meta-reflection a feedback-free reflection learning framework | arXiv: 2412.13781
meta-tool unleash open-world function calling capabilities of general-purpose la
metal a multi-agent framework for chart generation with test-time scaling | arXiv: 2502.17651
metasynth meta-prompting-driven agentic scaffolds for diverse synthetic data gen | arXiv: 2504.12563
mexma token-level objectives improve sentence representations | arXiv: 2409.12737
MHA2MLA: Towards Economical Inference by Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs | arXiv: 2502.14837
Micro-Act: Mitigate Knowledge Conflict in QA via Actionable Self-Reasoning | arXiv: 2506.05278
Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs | arXiv: 2502.14830
milic-eval benchmarking multilingual llms for chinas minority languages | arXiv: 2503.01150
mimicking the familiar dynamic command generation for information theft attacks
mind a multi-agent framework for zero-shot harmful meme detection | arXiv: 2507.06908
mind the belief gap group identity in the world of llms | arXiv: 2503.02016
mind the gap static and interactive evaluations of large audio models | arXiv: 2502.15919
mind the gesture evaluating ai sensitivity to culturally offensive non-verbal ge
mind your tone investigating how prompt politeness affects llm accuracy short pa | arXiv: 2510.04950
MindRef: Mimicking Human Memory for Hierarchical Reference Retrieval with Fine-Grained Location Awareness | arXiv: 2402.17010
minilongbench the low-cost long context understanding benchmark for large langua
minimal pair-based evaluation of code-switching | arXiv: 2506.01840
mining complex patterns of argumentative reasoning in natural language dialogue
mining the uncertainty patterns of humans and models in the annotation of moral
mir methodology inspiration retrieval for scientific research problems | arXiv: 2506.00249
mira empowering one-touch ai services on smartphones with mllm-based instruction | arXiv: 2509.13773
mirage exploring how large language models perform in complex social interactive | arXiv: 2501.01652
mire enhancing multimodal queries representation via fusion-free modality intera | arXiv: 2411.08334
mis-prompt benchmarking large language models for proactive error handling | arXiv: 2506.00064
misp-meeting a real-world dataset with multimodal cues for long-form meeting tra
mitigate position bias in large language models via scaling a single dimension | arXiv: 2406.02536
mitigating confounding in speech-based dementia detection through weight masking | arXiv: 2506.05610
mitigating lost-in-retrieval problems in retrieval augmented multi-hop question | arXiv: 2502.14245
mitigating negative interference in multilingual sequential knowledge editing th | arXiv: 2506.10800
mitigating non-representative prototypes and representation bias in few-shot con
mitigating posterior salience attenuation in long-context llms with positional c | arXiv: 2506.08371
Mitigating Selection Bias with Node Pruning and Auxiliary Options | arXiv: 2409.18857
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | arXiv: 2503.13360
mixture of decoding an attention-inspired adaptive decoding strategy to mitigate | arXiv: 2505.17061
mixture of insightful experts mote the synergy of reasoning chains and expert mi
mixture of ordered scoring experts for cross-prompt essay trait scoring
mixture of small and large models for chinese spelling check | arXiv: 2506.06887
mixtures of in-context learners | arXiv: 2411.02830
mlas-lora language-aware parameters detection and lora-based knowledge transfer
mldebugging towards benchmarking code debugging across multi-library scenarios | arXiv: 2506.13824
mm-verify enhancing multimodal reasoning with chain-of-thought verification | arXiv: 2502.13383
mmboundary advancing mllm knowledge boundary awareness through reasoning step co | arXiv: 2505.23224
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration | arXiv: 2505.23224
mmdend dendrite-inspired multi-branch multi-compartment parallel spiking neuron
mmina benchmarking multihop multimodal internet agents | arXiv: 2404.09992
mmlu-cf a contamination-free multi-task language understanding benchmark | arXiv: 2412.15194
mmmu pro robust benchmark
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark | arXiv: 2409.02813
mmrc a large-scale benchmark for understanding multimodal large language model i
mms-llama efficient llm-based audio-visual speech recognition with minimal multi | arXiv: 2503.11315
MMSafeAware: Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs | arXiv: 2502.11184
mmscibench benchmarking language models on chinese multimodal scientific problem | arXiv: 2503.01891
mmunlearner reformulating multimodal machine unlearning in the era of multimodal | arXiv: 2502.11051
mobilora accelerating lora-based llm inference on mobile devices via context-awa
moc mixtures of text chunking learners for retrieval-augmented generation system | arXiv: 2503.09600
mockconf a student interpretation dataset analysis word- and span-level alignmen | arXiv: 2506.04848
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models | arXiv: 2502.15910
Model Extrapolation Expedites Alignment | arXiv: 2404.16792
model performance-guided evaluation data selection for effective prompt optimiza | arXiv: 2505.10736
modeling complex semantics relation with contrastively fine-tuned relational enc
modeling the evolution of english noun compounds with feature-rich diachronic co
modeling uncertainty in composed image retrieval via probabilistic embeddings
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment | arXiv: 2407.14878
molrag unlocking the power of large language models for molecular property predi
monitoring decoding mitigating hallucination via evaluating the factuality of pa | arXiv: 2503.03106
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | arXiv: 2506.07533
more a mixture of low-rank experts for adaptive multi-task learning | arXiv: 2505.22694
more is not always better enhancing many-shot in-context learning with different
Morpher: Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? | arXiv: 2412.08174
MorphMark: Flexible Adaptive Watermarking for Large Language Models | arXiv: 2505.11541
mosaic multiple observers spotting ai content | arXiv: 2409.07615
moscar a large-scale multilingual and multimodal document-level corpus | arXiv: 2406.08707
movie101v2 improved movie narration benchmark | arXiv: 2404.13370
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | arXiv: 2409.03420
MPO: Multilingual Safety Alignment via Reward Gap Optimization | arXiv: 2505.16869
mpvstance mitigating hallucinations in stance detection with multi-perspective v
mrakl multilingual retrieval-augmented knowledge graph construction for low-reso | arXiv: 2507.16011
mt-raig novel benchmark and evaluation framework for retrieval-augmented insight | arXiv: 2502.11735
mtextthreesuperiorgqa a multi-entity multi-hop multi-setting graph question answ
mtsa multi-turn safety alignment for llms through multi-round red-teaming | arXiv: 2505.17147
mtvqa benchmarking multilingual text-centric visual question answering | arXiv: 2405.11985
multi-agent collaboration via cross-team orchestration | arXiv: 2406.08979
Multi-Attribute Steering of Language Models via Targeted Intervention | arXiv: 2502.12446
Multi-document Summarization through Event Relation Graph Reasoning for Framing Bias Mitigation | arXiv: 2506.12978
multi-facet blending for faceted query-by-example retrieval | arXiv: 2412.01443
multi-hop question generation via dual-perspective keyword guidance | arXiv: 2505.15299
multi-hop reasoning for question answering with hyperbolic representations | arXiv: 2507.03612
multi-level association refinement network for dialogue aspect-based sentiment q
Multi-Level Explanations for Generative Language Models | arXiv: 2403.14459
multi-level relevance document identifier learning for generative retrieval
multi-modality expansion and retention for llms through parameter merging and de
multi-perspective alignment for increasing naturalness in neural machine transla | arXiv: 2412.08473
multi-prompting decoder helps better language understanding | arXiv: 2406.06279
multi-task adversarial attacks against black-box model with few-shot queries | arXiv: 2508.10039
multiagentbench evaluating the collaboration and competition of llm agents | arXiv: 2503.01935
multilingual arbitration optimizing data pools to accelerate multilingual progre
multilingual encoder knows more than you realize shared weights pretraining for | arXiv: 2502.10852
multilingual gloss-free sign language translation towards building a sign langua
multilingual retrieval augmented generation for culturally-sensitive tasks a ben | arXiv: 2410.01171
multilingual text-to-image generation magnifies gender stereotypes
multimed multilingual medical speech recognition via attention encoder decoder | arXiv: 2409.14074
MultiMM: Cultural Bias Matters — Cross-Cultural Benchmark for Multimodal Metaphors | arXiv: 2506.06987
multimodal coreference resolution for chinese social media dialogues dataset and | arXiv: 2504.14321
multimodal pragmatic jailbreak on text-to-image models | arXiv: 2409.19149
multimodal transformers are hierarchical modal-wise heterogeneous graphs | arXiv: 2505.01068
Multiple LLM Agents Debate for Equitable Cultural Alignment | arXiv: 2505.24671
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts | arXiv: 2406.12549
musc improving complex instruction following with multi-granularity self-contras
musts multilingual semantic textual similarity benchmark
Mutual-Taught for Co-adapting Policy and Reward Models | arXiv: 2506.06292
my life is miserable have to sign 500 autographs everyday exposing humblebraggin | arXiv: 2412.20057
my words imply your opinion reader agent-based propagation enhancement for perso
nametag 3 a tool and a service for multilingualmultitagset ner | arXiv: 2506.05949
narrative media framing in political discourse | arXiv: 2506.00737
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention | arXiv: 2502.11089
natural language processing in support of evidence-based medicine a scoping revi | arXiv: 2505.22280
navigating rifts in human-llm grounding study and benchmark | arXiv: 2503.13975
negative matters multi-granularity hard-negative synthesis and anchor-token-awar
negvqa can vision language models understand negation | arXiv: 2505.22946
neko cross-modality post-recognition error correction with tasks-guided mixture- | arXiv: 2411.05945
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset | arXiv: 2412.02595
neural incompatibility the unbridgeable gap of cross-scale parametric knowledge
neural parameter search for slimmer fine-tuned models and better transfer | arXiv: 2505.18713
neural topic modeling with large language models in the loop | arXiv: 2411.08534
neuron empirical gradient discovering and quantifying neurons global linear cont | arXiv: 2412.18053
neuron-level sequential editing for large language models | arXiv: 2410.04045
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering | arXiv: 2505.19754
newsinterview a dataset and a playground to evaluate llms grounding gap via info | arXiv: 2411.13779
NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization | arXiv: 2505.24575
ngqa a nutritional graph question answering benchmark for personalized health-aw
no questions are stupid but some are poorly posed understanding poorly-posed inf
noreval a norwegian language understanding and generation evaluation benchmark | arXiv: 2504.07749
Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability | arXiv: 2408.08137
not all terms matter recall-oriented adaptive learning for plm-aided query expan
not quite sherlock holmes language model predictions do not reliably differentia | arXiv: 2506.06808
Nudging: Inference-time Alignment of LLMs via Guided Decoding | arXiv: 2410.09300
nusaaksara a multimodal and multilingual benchmark for preserving indonesian ind
nvagent automated data visualization from natural language via collaborative age
oasis order-augmented strategy for improved code search | arXiv: 2503.08161
obfuslm privacy-preserving language model service against embedding inversion at
Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation | arXiv: 2503.08057
olmotrace tracing language model outputs back to trillions of training tokens | arXiv: 2504.07096
omgm orchestrate multiple granularities and modalities for efficient multimodal | arXiv: 2505.07879
omnialign-v towards enhanced alignment of mllms with human preference | arXiv: 2502.18411
omnicharacter towards immersive role-playing agents with seamless speech-languag
omniflatten an end-to-end gpt model for seamless voice conversation | arXiv: 2410.17799
on entity identification in language models | arXiv: 2506.02701
on generalization across measurement systems llms entail more test-time compute | arXiv: 2506.02591
on many-shot in-context learning for long-context evaluation | arXiv: 2411.07130
on support samples of next word prediction | arXiv: 2506.04047
on synthesizing data for context attribution in question answering | arXiv: 2504.05317
on synthetic data strategies for domain-specific generative retrieval | arXiv: 2502.17957
on the acquisition of shared grammatical representations in bilingual language m | arXiv: 2503.03962
On the Limit of Language Models as Planning Formalizers | arXiv: 2412.09879
on the mutual influence of gender and occupation in llm representations | arXiv: 2503.06792
on the relation between fine-tuning topological properties and task performance
On the Reliability of Large Language Models for Causal Discovery | arXiv: 2407.19638
on the risk of evidence pollution for malicious social text detection in the era | arXiv: 2410.12600
on the robust approximation of asr metrics | arXiv: 2502.12408
on-policy self-alignment with fine-grained knowledge feedback for hallucination | arXiv: 2406.12221
one for all update parameterized knowledge across multiple models with once edit | arXiv: 2506.00817
one missing piece for open-source reasoning models a dataset to mitigate cold-st | arXiv: 2506.02338
one quantllm for all fine-tuning quantized llms once for efficient deployments | arXiv: 2405.20202
one size fits none rethinking fairness in medical ai | arXiv: 2506.14400
onebench to test them all sample-level benchmarking over open-ended capabilities | arXiv: 2412.06745
Online Iterative Self-Alignment for Radiology Report Generation | arXiv: 2505.11983
Only a Little to the Left: A Theory-grounded Measure of Political Bias in LLMs | arXiv: 2503.16148
ontology-guided reverse thinking makes large language models stronger on knowled
open-set living need prediction with large language models | arXiv: 2506.02713
open-world attribute mining for e-commerce products with multimodal self-correct
open-world planning via lifted regression with llm-inferred affordances for embo
opencoder the open cookbook for top-tier code large language models | arXiv: 2411.04905
openwebvoyager building multimodal web agents via iterative real-world explorati
opt-out investigating entity-level unlearning for large language models via opti | arXiv: 2406.12329
Optimal Transport-Based Token Weighting for Enhanced Preference Optimization | arXiv: 2505.18720
optimized text embedding models and benchmarks for amharic passage retrieval | arXiv: 2505.19356
optimizing decomposition for optimal claim verification | arXiv: 2503.15354
optimizing pre-training data mixtures with mixtures of data expert models | arXiv: 2502.15950
optimizing question semantic space for dynamic retrieval-augmented multi-hop que
os agents a survey on mllm-based agents for general computing devices use | arXiv: 2508.04482
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | arXiv: 2412.19723
os-kairos adaptive interaction for mllm-powered gui agents | arXiv: 2503.16465
outlier-safe pre-training for robust 4-bit quantization of large language models | arXiv: 2506.19697
ozspeech one-step zero-shot speech synthesis with learned-prior-conditioned flow | arXiv: 2505.12800
p2 law scaling law for post-training after model pruning
p3 prompts promote prompting | arXiv: 2507.15675
palm a culturally inclusive and linguistically diverse dataset for arabic llms | arXiv: 2503.00151
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models | arXiv: 2408.13533
pap2pat benchmarking outline-guided long-text patent generation with patent-pape | arXiv: 2410.07009
papersplease a benchmark for evaluating motivational values of large language mo | arXiv: 2506.21961
parameter-aware contrastive knowledge editing tracing and rectifying based on cr
parameter-efficient fine-tuning via circular convolution | arXiv: 2407.19342
Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning | arXiv: 2410.10360
parme parallel corpora for low-resourced middle eastern languages
partial colexifications improve concept embeddings | arXiv: 2502.09743
pasa an llm agent for comprehensive academic paper search | arXiv: 2501.10120
past meets present creating historical analogy with large language models | arXiv: 2409.14820
patch psychometrics-assisted benchmarking of large language models against human | arXiv: 2404.01799
pattern recognition or medical knowledge the problem with multiple-choice questi | arXiv: 2406.02394
pcot persuasion-augmented chain of thought for detecting fake news and social me | arXiv: 2506.06842
PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation | arXiv: 2506.06842
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text | arXiv: 2501.15654
performance gap in entity knowledge extraction across modalities in vision langu | arXiv: 2412.14133
persistent homology of topic networks for the prediction of reader curiosity | arXiv: 2506.11095
persona dynamics unveiling the impact of persona traits on agents in text-based | arXiv: 2504.06868
personabench evaluating ai models on understanding personal information through | arXiv: 2502.20616
personal travel solver a preference-driven llm-solver system for travel planning
personalens a benchmark for personalization evaluation in conversational ai assi | arXiv: 2506.09902
Personality-Guided Code Generation Using Large Language Models | arXiv: 2411.00006
personalized generation in large model era a survey | arXiv: 2503.02614
personalized text generation with contrastive activation steering | arXiv: 2503.05213
perspective transition of large language models for solving subjective tasks | arXiv: 2501.09265
persphere a comprehensive framework for multi-faceted perspective retrieval and | arXiv: 2412.12588
phi-decoding adaptive foresight sampling for balanced inference-time exploration
phonotomizer a compact unsupervised online training approach to real-time multil
physreason a comprehensive benchmark towards physics-based reasoning | arXiv: 2502.12054
pic unlocking long-form text generation capabilities of large language models vi
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative Prompts | arXiv: 2505.09921
piguard prompt injection guardrail via mitigating overdefense for free
piper benchmarking and prompting event reasoning boundary of llms via debiasing-
pitfalls of scale investigating the inverse task of redefinition in large langua | arXiv: 2502.12821
pixel-level reasoning segmentation via multi-turn conversations | arXiv: 2502.09447
pkag-ddi pairwise knowledge-augmented language model for drug-drug interaction e
pku-saferlhf towards multi-level safety alignment for llms with human preference | arXiv: 2406.15513
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities | arXiv: 2502.11221
Planning with Diffusion Models for Target-Oriented Dialogue Systems | arXiv: 2504.16858
planning-driven programming a large language model programming workflow | arXiv: 2411.14503
planningarena a modular benchmark for multidimensional evaluation of planning an
play2prompt zero-shot tool instruction optimization for llm agents via tool play | arXiv: 2503.14432
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models | arXiv: 2506.07424
polynarrative a multilingual multilabel multi-domain dataset for narrative extra
popalign diversifying contrasting patterns for a more comprehensive alignment | arXiv: 2410.13785
position-aware automatic circuit discovery | arXiv: 2502.04577
positional overload positional debiasing and context window extension for large
powerformer efficient and high-accuracy privacy-preserving language model with h
ppt a minor language news recommendation model via cross-lingual preference patt
pqr improving dense retrieval via potential query modeling
praetor a fine-grained generative llm evaluator with instance-level customizable
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges | arXiv: 2502.12378
praise enhancing product descriptions with llm-driven structured insights | arXiv: 2506.17314
pre-training curriculum for multi-token prediction in language models | arXiv: 2505.22757
pre-training distillation for large language models a design space exploration | arXiv: 2410.16215
predicate-conditional conformalized answer sets for knowledge graph embeddings | arXiv: 2505.16877
Predicting Implicit Arguments in Procedural Video Instructions | arXiv: 2505.21068
predicting through generation why generation is better for prediction | arXiv: 2502.17817
predicting turn-taking and backchannel in human-machine conversations using ling | arXiv: 2505.12654
prediction hubs are context-informed frequent tokens in llms | arXiv: 2502.10201
prep-ocr a complete pipeline for document image restoration and enhanced ocr acc | arXiv: 2505.20429
pretraining context compressor for large language models with embedding-based me
preventing rogue agents improves multi-agent collaboration | arXiv: 2502.05986
Pre³: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation | arXiv: 2506.03887
Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries | arXiv: 2505.21859
Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks | arXiv: 2407.17963
PRISM: A Framework for Producing Interpretable Political Bias Embeddings | arXiv: 2505.24646
PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance | arXiv: 2502.17041
privacyrestore privacy-preserving inference in large language models via privacy
private memorization editing turning memorization into a defense to strengthen d | arXiv: 2506.10024
prmbench a fine-grained and challenging benchmark for process-level reward model | arXiv: 2501.03124
probabilistic aggregation and targeted embedding optimization for collective mor | arXiv: 2506.14625
probability-consistent preference optimization for enhanced llm reasoning | arXiv: 2505.23540
probing llms for multilingual discourse generalization through a unified label s | arXiv: 2503.10515
probing relative interaction and dynamic calibration in multi-modal entity align
probing subphonemes in morphology models | arXiv: 2505.11297
probing the geometry of truth consistency and generalization of truth directions | arXiv: 2506.00823
problem-solving logic guided curriculum in-context learning for llms complex rea | arXiv: 2502.15401
proceedings of the 63rd annual meeting of the association for computational ling
processbench identifying process errors in mathematical reasoning | arXiv: 2412.06559
progco program helps self-correction of large language models | arXiv: 2501.01264
program synthesis benchmark for visual programming in xlogoonline environment | arXiv: 2406.11334
programming by example meets historical linguistics a large language model based
progressive multimodal reasoning via active retrieval | arXiv: 2412.14835
promalex progressive modular adapters for multi-jurisdictional legal language mo
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation | arXiv: 2506.03857
prompt-based personality profiling reinforcement learning for relevance filterin | arXiv: 2409.04122
prompt-guided internal states for hallucination detection of large language mode
proper a progressive learning framework for personalized large language models w
protolens advancing prototype learning for fine-grained interpretability in text
provbench a benchmark of legal provision recommendation for contract auto-review
ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering | arXiv: 2507.00828
proxy-driven robust multimodal sentiment analysis with incomplete data
psyadvisor a plug-and-play strategy advice planner with proactive questioning in
psycholinguistic word features a new approach for the evaluation of llms alignme | arXiv: 2506.22439
psydial a large-scale long-term conversational dataset for mental health support
psydt using llms to construct the digital twin of psychological counselor with p
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models | arXiv: 2502.13179
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension | arXiv: 2412.11906
PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings | arXiv: 2506.00481
pwngpt automatic exploit generation based on large language models
q2e query-to-event decomposition for zero-shot multilingual text-to-video retrie | arXiv: 2506.10202
QAEncoder: Towards Aligned Representation Learning in Question Answering Systems | arXiv: 2409.20434
qaeval mixture of evaluators for question-answering task evaluation
qdtsynth quality-driven formal theorem synthesis for enhancing proving performan
qg-sms enhancing test item analysis via student modeling and simulation | arXiv: 2503.05888
qqsum a novel task and model of quantitative query-focused summarization for rev | arXiv: 2506.04020
Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis | arXiv: 2505.14742
qualispeech a speech quality assessment dataset with natural language reasoning | arXiv: 2503.20290
quantification of large language model distillation | arXiv: 2501.12619
quantifying lexical semantic shift via unbalanced optimal transport | arXiv: 2412.12569
quantifying misattribution unfairness in authorship attribution | arXiv: 2506.02321
quantifying semantic emergence in language models | arXiv: 2405.12617
quantized can still be calibrated a unified framework to calibration in quantize
quasar a question-driven structure-aware approach for table-to-text generation
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies | arXiv: 2505.06186
queryattack jailbreaking aligned large language models using structured non-natu | arXiv: 2502.09723
qwen25-xcoder multi-agent collaboration for multilingual code instruction tuning
r-fairness assessing fairness of ranking in subjective data
R-VC: Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching | arXiv: 2506.01014
r-vlm region-aware vision language model for precise gui grounding | arXiv: 2507.05673
r2-multiomnia leading multilingual multimodal reasoning via self-training
R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory | arXiv: 2501.12485
RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection | arXiv: 2505.14318
raemollm retrieval augmented llms for cross-domain misinformation detection usin | arXiv: 2406.11093
rag-critic leveraging automated critic-guided agentic workflow for retrieval aug
rageval scenario specific rag evaluation dataset generation framework | arXiv: 2408.01262
rank chunk and expand lineage-oriented reasoning for taxonomy expansion | arXiv: 2505.13282
rankcot refining knowledge for retrieval-augmented generation through ranking ch
ranked voting based self-consistency of large language models | arXiv: 2505.10772
ranking unraveled recipes for llm rankings in head-to-head ai combat | arXiv: 2411.14483
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models | arXiv: 2412.02830
rate-nav region-aware termination enhancement for zero-shot object navigation wi | arXiv: 2506.02354
rationales are not silver bullets measuring the impact of rationales on model pe | arXiv: 2505.24147
rationalyst pre-training process-supervision for improving reasoning
raven robust advertisement video violation temporal grounding via reinforcement | arXiv: 2510.16455
Re-identification of De-identified Documents with Autoregressive Infilling | arXiv: 2505.12859
Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms | arXiv: 2501.13977
re-task revisiting llm tasks from capability skill and knowledge perspectives | arXiv: 2408.06904
re3syn a dependency-based data synthesis framework for long-context post-trainin
Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books | arXiv: 2506.01796
readoc a unified benchmark for realistic document structured extraction | arXiv: 2409.05137
real-mm-rag a real-world multi-modal retrieval benchmark | arXiv: 2502.12342
real-time factuality assessment from adversarial feedback | arXiv: 2410.14651
realhitbench a comprehensive realistic hierarchical table benchmark for evaluati | arXiv: 2506.13405
reason from future reverse thought chain enhances llm reasoning | arXiv: 2506.03673
reasoning circuits in language models a mechanistic interpretation of syllogisti | arXiv: 2408.08590
reasoning is all you need for video generalization a counterfactual benchmark wi | arXiv: 2503.10691
recent advances in speech language models a survey | arXiv: 2410.03751
reclm recommendation instruction tuning | arXiv: 2412.19302
reconsidering llm uncertainty estimation methods in the wild | arXiv: 2506.01114
Recurrent Knowledge Identification and Fusion for Language Model Continual Learning | arXiv: 2502.17510
recursive question understanding for complex question answering over heterogeneo | arXiv: 2505.11900
red queen safeguarding large language models against concealed multi-turn jailbr | arXiv: 2409.17458
red-teaming llm multi-agent systems via communication attacks | arXiv: 2502.14847
redactor an llm-powered framework for automatic clinical data de-identification | arXiv: 2505.18380
redundancy isotropy and intrinsic dimensionality of prompt-based text embeddings | arXiv: 2506.01435
redundancy principles for mllms benchmarks | arXiv: 2501.13953
redundancylens revealing and exploiting visual token processing redundancy for e | arXiv: 2501.19036
reefknot a comprehensive benchmark for relation hallucination evaluation analysi | arXiv: 2408.09429
ref-long benchmarking the long-context referencing capability of long-context la | arXiv: 2507.09506
refind at semeval-2025 task 3 retrieval-augmented factuality hallucination detec | arXiv: 2502.13622
refining salience-aware sparse fine-tuning strategies for language models | arXiv: 2412.13488
ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework | arXiv: 2409.10289
reflectioncoder learning from reflection sequence for enhanced one-off code gene | arXiv: 2405.17057
ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | arXiv: 2410.17657
refreshkv updating small kv cache during long-form generation | arXiv: 2411.05787
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | arXiv: 2407.09121
registering source tokens to target language spaces in multilingual neural machi | arXiv: 2501.02979
reinforced ir a self-boosting framework for domain-adapted information retrieval
relationalcoder rethinking complex tables via programmatic relational transforma
relearn unlearning via learning for large language models | arXiv: 2502.11190
reliably bounding false positives a zero-shot machine-generated text detection f
removal of hallucination on hallucination debate-augmented rag | arXiv: 2505.18581
REP: Keys to Robust Edits — From Theoretical Insights to Practical Advances | arXiv: 2410.09338
repanda pandas-powered tabular verification and reasoning | arXiv: 2503.11921
Representation Bending for Large Language Model Safety | arXiv: 2504.01550
representations of fact fiction and forecast in large language models epistemics | arXiv: 2506.01512
repro-bench can agentic ai systems assess the reproducibility of social science | arXiv: 2507.18901
reranking-based generation for unbiased perspective summarization | arXiv: 2506.15925
ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision | arXiv: 2505.21250
research borderlands analysing writing across research cultures | arXiv: 2506.00784
response wide shut surprising observations in basic vision language model capabi | arXiv: 2507.10442
rethinking evaluation metrics for grammatical error correction why use a differe | arXiv: 2502.09416
rethinking kenlm good and bad model ensembles for efficient text quality filteri
rethinking repetition problems of llms in code generation | arXiv: 2505.10402
rethinking reward model evaluation through the lens of reward overoptimization | arXiv: 2505.12763
rethinking semantic parsing for large language models enhancing llm performance | arXiv: 2409.14469
rethinking table instruction tuning | arXiv: 2501.14693
rethinking the role of prompting strategies in llm test-time scaling a perspecti | arXiv: 2505.10981
retrieval models arent tool-savvy benchmarking tool retrieval for large language | arXiv: 2503.01763
retrieval visual contrastive decoding to mitigate object hallucinations in large | arXiv: 2505.20569
retrieval-augmented fine-tuning with preference optimization for visual program | arXiv: 2502.16529
Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification | arXiv: 2402.04068
retrofitting large language models with dynamic tokenization | arXiv: 2411.18553
retrollm empowering large language models to retrieve fine-grained evidence with | arXiv: 2412.11919
retrospective learning from interactions | arXiv: 2410.13852
revealing the deceptiveness of knowledge editing a mechanistic analysis of super | arXiv: 2505.12636
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up | arXiv: 2410.12323
reverse preference optimization for complex instruction following | arXiv: 2505.22172
revisit self-debugging with self-generated tests for code generation | arXiv: 2501.12793
revisiting 3d llm benchmarks are we really testing 3d capabilities | arXiv: 2502.08503
revisiting classical chinese event extraction with ancient literature informatio
Revisiting Common Assumptions about Arabic Dialects in NLP | arXiv: 2505.21816
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability | arXiv: 2506.15629
revisiting epistemic markers in confidence estimation can markers accurately ref
revisiting llms as zero-shot time series forecasters small noise can break large | arXiv: 2506.00457
revisiting lora through the lens of parameter redundancy spectral encoding helps | arXiv: 2506.16787
revisiting scaling laws for language models the role of data quality and trainin
revisiting self-consistency from dynamic distributional alignment perspective on | arXiv: 2502.19830
revisiting the test-time scaling of o1-like models do they truly possess test-ti | arXiv: 2502.12215
revisiting uncertainty quantification evaluation in language models spurious int | arXiv: 2504.13677
Revisiting Weak-to-Strong Generalization: Reverse KL vs. Forward KL | arXiv: 2502.11107
reviving cultural heritage a novel approach for comprehensive historical documen
revs unlearning sensitive information in language models via rank editing in the | arXiv: 2406.09325
Reward Generalization in RLHF: A Topological Perspective | arXiv: 2402.10184
rewrite to jailbreak discover learnable and transferable implicit harmfulness in | arXiv: 2502.11084
right answer wrong score uncovering the inconsistencies of llm evaluation in mul | arXiv: 2503.14996
riot efficient prompt refinement with residual optimization tree | arXiv: 2506.16389
rise reasoning enhancement via iterative self-exploration in multi-hop question | arXiv: 2505.21940
RISE: Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing | arXiv: 2410.06638
rmoa optimizing mixture-of-agents through diversity maximization and residual co | arXiv: 2505.24442
robust and minimally invasive watermarking for eaas | arXiv: 2410.17552
robust data watermarking in language models by injecting fictitious knowledge | arXiv: 2503.04036
robust estimation of population-level effects in repeated-measures nlp experimen
robust preference optimization via dynamic target margins | arXiv: 2506.03690
robust utility-preserving text anonymization based on large language models | arXiv: 2407.11770
rocoft efficient finetuning of large language models with row-column updates | arXiv: 2410.10075
roleplot a systematic framework for evaluating and enhancing the plot-progressio
Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context | arXiv: 2410.16069
root defense strategies ensuring safety of llm at the decoding level
rotor towards more reliable responses for order-invariant inputs | arXiv: 2502.08662
rpo retrieval preference optimization for robust retrieval-augmented generation | arXiv: 2501.13726
rsatexttwosuperior a rhetorical-strategy-aware rational speech act framework for | arXiv: 2506.09301
RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph | arXiv: 2505.20813
rsvp reasoning segmentation via visual prompting and multi-modal chain-of-though | arXiv: 2506.04277
rubriks cube testing a new rubric for evaluating explanations on the cube datase | arXiv: 2503.23899
ruby an effective framework for multi-constraint multi-hop question generation
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios | arXiv: 2412.08972
s-rag a novel audit framework for detecting unauthorized use of personal data in
s2r teaching llms to self-verify and self-correct via reinforcement learning
s2wtm spherical sliced-wasserstein autoencoder for topic modeling | arXiv: 2507.12451
s3 - semantic signal separation
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification | arXiv: 2506.04592
safer or luckier llms as safety evaluators are not robust to artifacts | arXiv: 2503.09347
saferag benchmarking security in retrieval-augmented generation of large languag | arXiv: 2501.18636
saferoute adaptive model selection for efficient and accurate safety guardrails | arXiv: 2502.12464
safety alignment via constrained knowledge unlearning | arXiv: 2505.18588
safety is not only about refusal reasoning-enhanced fine-tuning for interpretabl | arXiv: 2503.05021
sake steering activations for knowledge editing | arXiv: 2503.01751
salience sparse fine tuning
sam decoding speculative decoding via suffix automaton | arXiv: 2411.10666
sample-efficient human evaluation of large language models via maximum discrepan | arXiv: 2404.08008
Sandcastles in the Storm: Revisiting Watermarking Impossibility | arXiv: 2505.06827
sanskriti a comprehensive benchmark for evaluating language models knowledge of | arXiv: 2506.15355
sara salience-aware reinforced adaptive decoding for large language models in ab
scalable vision language model training via high quality data curation | arXiv: 2501.05952
scale towards collaborative content analysis in social science with large langua
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting | arXiv: 2406.19976
ScaleQuest: Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch | arXiv: 2410.18693
scaling context not parameters training a compact 7b language model for efficien | arXiv: 2505.08651
scaling laws and efficient inference for ternary language models | arXiv: 2506.23025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | arXiv: 2502.14846
scaling up the state size of rnn llms for long-context scenarios
scanez integrating cognitive models with self-supervised learning for spatiotemp
SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning | arXiv: 2406.10882
scedit script-based assessment of knowledge editing | arXiv: 2505.23291
scenegenagent precise industrial scene generation with coding agent | arXiv: 2410.21909
sci-lora mixture of scientific loras for cross-domain lay paraphrasing | arXiv: 2505.18867
sciver evaluating foundation models for multimodal scientific claim verification | arXiv: 2506.15569
sconu selective conformal uncertainty in large language models | arXiv: 2504.14154
scop evaluating the comprehension process of large language models from a cognit | arXiv: 2506.05000
scope optimizing key-value cache compression in long-context generation | arXiv: 2412.13649
sculpt systematic tuning of long prompts | arXiv: 2410.20788
sdbench a survey-based domain-specific llm benchmarking and optimization framewo
sdd self-degraded defense against malicious fine-tuning | arXiv: 2507.21182
sdpo segment-level direct preference optimization for social agents | arXiv: 2501.01821
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | arXiv: 2502.12562
seakr self-aware knowledge retrieval for adaptive retrieval augmented generation | arXiv: 2406.19215
seal scaling to emphasize attention for long-context retrieval | arXiv: 2501.15225
second language arabic acquisition of llms via progressive vocabulary expansion | arXiv: 2412.12310
secret semi-supervised clinical trial document similarity search | arXiv: 2505.10780
SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization | arXiv: 2402.11347
seedbench a multi-task benchmark for evaluating large language models in seed sc | arXiv: 2505.13220
seeking rational demonstrations for large language models a domain generalizatio
segment first or comprehend first explore the limit of unsupervised word segment
segment-based attention masking for gpts | arXiv: 2412.18487
Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models | arXiv: 2412.11333
select read and write a multi-agent framework of full-text-based related work ge | arXiv: 2505.19647
selecting and merging towards adaptable and scalable named entity recognition wi
self-correction is more than refinement a learning framework for visual and lang | arXiv: 2410.04055
self-critique guided iterative reasoning for multi-hop question answering | arXiv: 2505.19112
self-error-instruct generalizing from errors for llms mathematical reasoning | arXiv: 2505.22591
self-foveate enhancing diversity and difficulty of synthesized instructions from | arXiv: 2507.23440
self-instructed derived prompt generation meets in-context learning unlocking ne | arXiv: 2409.01552
SELF-PERCEPT: Introspection Improves LLMs' Detection of Multi-Person Mental Manipulation in Conversations | arXiv: 2505.20679
self-supervised quantized representation for seamlessly integrating knowledge gr
Self-Taught Agentic Long-Context Understanding | arXiv: 2502.15920
self-training elicits concise reasoning in large language models | arXiv: 2502.20122
self-tuning instructing llms to effectively acquire new knowledge through self-t | arXiv: 2406.06326
SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence | arXiv: 2502.08767
semantic aware linear transfer by recycling pre-trained language models for cros | arXiv: 2505.10945
Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models | arXiv: 2501.05752
semantic outlier removal with embedding models and llms | arXiv: 2506.16644
semantic-eval a semantic comprehension evaluation framework for large language m
semeval-2025 task 1 admire -- advancing multimodal idiomaticity representation | arXiv: 2503.15358
sentiment reasoning for healthcare | arXiv: 2407.21054
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection | arXiv: 2503.03303
separating tongue from thought activation patching reveals language-agnostic con
seqpo-simt sequential policy optimization for simultaneous machine translation | arXiv: 2505.20622
serial lifelong editing via mixture of knowledge experts
seuf is unlearning one expert enough for mixture-of-experts llms | arXiv: 2411.18797
sgic a self-guided iterative calibration framework for rag | arXiv: 2506.16172
shaping the safety boundaries understanding and defending against jailbreaks in
share an slm-based hierarchical action correction assistant for text-to-sql | arXiv: 2506.00391
share shared memory-aware open-domain long-term dialogue dataset constructed fro
share text to sql correction
Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding
sheeps skin wolfs deeds are llms ready for metaphorical implicit hate speech
ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework | arXiv: 2410.19453
Shifting from Ranking to Set Selection for Retrieval Augmented Generation | arXiv: 2507.06838
should i believe in what medical ai says a chinese benchmark for medication base
shubert self-supervised sign language representation learning via multi-stream c | arXiv: 2411.16765
sift-50m a large-scale multilingual dataset for speech instruction fine-tuning | arXiv: 2504.09081
sightation counts leveraging sighted user feedback in building a blv-aligned dat
silencing empowerment allowing bigotry auditing the moderation of hate speech on | arXiv: 2506.07667
simgrag leveraging similar subgraphs for knowledge graphs driven retrieval-augme | arXiv: 2412.15272
simuls2s-llm unlocking simultaneous inference of speech llms for speech-to-speec
sincon mitigate llm-generated malicious message injection attack for rumor detec
singakids a multilingual multimodal dialogic tutor for language learning | arXiv: 2506.02412
single- vs dual-prompt dialogue generation with llms for job interviews in human | arXiv: 2502.18650
single-to-mix modality alignment with multimodal large language model for docume | arXiv: 2507.07572
sinhala encoder-only language models and evaluation
skillaggregation reference-free llm-dependent aggregation | arXiv: 2410.10215
SkillVerse: Assessing and Enhancing LLMs with Tree Evaluation | arXiv: 2506.00319
sklep a slovak general language understanding benchmark | arXiv: 2506.21508
slamming training a speech language model on one gpu in a day | arXiv: 2502.15814
sleepless nights sugary days creating synthetic users with health conditions for | arXiv: 2502.13135
Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models | arXiv: 2412.14574
small changes big impact how manipulating a few neurons can drastically alter ll
smart self-aware agent for tool overuse mitigation | arXiv: 2502.11435
smarter better faster longer a modern bidirectional encoder for fast memory effi | arXiv: 2412.13663
socialcc interactive evaluation for cultural competence in language agents
socialeval evaluating social intelligence of large language models | arXiv: 2506.00900
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | arXiv: 2502.12134
somethings fishy in the data lake a critical re-evaluation of table union search | arXiv: 2505.21329
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition | arXiv: 2402.17645
sorft issue resolving with subtask-oriented reinforced fine-tuning | arXiv: 2502.20127
sotopia-ensuremathomega dynamic strategy injection learning and social instructi | arXiv: 2502.15538
soundwave less is more for speech-text alignment in llms | arXiv: 2502.12900
spare enhancing spatial reasoning in vision-language models with synthetic data | arXiv: 2504.20648
spark-tts an efficient llm-based text-to-speech model with single-stream decoupl | arXiv: 2503.01710
sparse latents steer retrieval-augmented generation
sparse logit sampling accelerating knowledge distillation in llms | arXiv: 2503.16870
sparse rewards can self-train dialogue agents | arXiv: 2409.04617
sparse-to-dense a free lunch for lossless acceleration of video understanding in | arXiv: 2505.19155
Sparsify: Learning Sparsity for Effective and Efficient Music Performance Question Answering | arXiv: 2506.01319
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues | arXiv: 2506.00958
spectra faster large language model inference with optimized internal and extern
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods | arXiv: 2507.21463
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models | arXiv: 2507.19361
speechweave diverse multilingual synthetic text audio data generation pipeline f | arXiv: 2509.14270
speed up your code progressive code acceleration through bidirectional tree edit
SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation | arXiv: 2412.12693
SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers | arXiv: 2507.06517
splintering nonconcatenative languages for better tokenization | arXiv: 2503.14433
spot bridging natural language and geospatial search for investigative journalis | arXiv: 2506.13188
spotting out-of-character behavior atomic-level evaluation of persona fidelity i | arXiv: 2506.19352
spurious correlations and beyond understanding and mitigating shortcut learning
sql injection jailbreak a structural disaster of large language models | arXiv: 2411.01565
sqlong enhanced nl2sql for longer contexts with llms | arXiv: 2502.16747
squeezed attention accelerating long context length llm inference | arXiv: 2411.09688
sr-llm rethinking the structured representation in large language model | arXiv: 2502.14352
star-sql self-taught reasoner for text-to-sql | arXiv: 2502.13550
state toxicn a benchmark for span-level target-aware toxicity extraction in chin | arXiv: 2501.15451
State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models | arXiv: 2503.03499
statement-tuning enables efficient cross-lingual generalization in encoder-only | arXiv: 2506.01592
Statistical Deficiency for Task Inclusion Estimation | arXiv: 2503.05491
stealing training data from large language models in decentralized training thro
steering into new embedding spaces analyzing cross-lingual alignment induced by
steering off course reliability challenges in steering language models | arXiv: 2504.04635
stem-pom evaluating language models math-symbol reasoning in document parsing | arXiv: 2411.00387
stepwise reasoning disruption attack of llms | arXiv: 2412.11934
Stepwise Reasoning Disruption Attack of LLMs | arXiv: 2412.11934
sticking to the mean detecting sticky tokens in text embedding models | arXiv: 2507.18171
stitchllm serving llms one block at a time
stochastic chameleons irrelevant context hallucinations reveal class-based misge | arXiv: 2505.22630
stress-testing machine generated text detection shifting language models writing | arXiv: 2505.24523
stricta structured reasoning in critical text assessment for peer review and bey | arXiv: 2409.05367
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond | arXiv: 2409.05367
StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text | arXiv: 2406.10621
structflowbench a structured flow benchmark for multi-turn instruction following | arXiv: 2502.14494
structural reasoning improves molecular understanding of llm | arXiv: 2410.05610
structure-aware domain knowledge injection for large language models | arXiv: 2407.16724
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | arXiv: 2409.06211
sublime subset selection via rank correlation prediction for data-efficient llm
Substance over Style: Evaluating Proactive Conversational Coaching Agents | arXiv: 2503.19328
subword models struggle with word learning but surprisal hides it | arXiv: 2502.12835
sudo rm -rf agentic security | arXiv: 2503.20279
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment | arXiv: 2410.14676
surveyforge on the outline heuristics memory-driven generation and multi-dimensi
surveypilot an agentic framework for automated human opinion collection from soc
swiltra-bench the swiss legal translation benchmark | arXiv: 2503.01372
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images | arXiv: 2502.13928
synapticrag enhancing temporal memory retrieval in large language models through | arXiv: 2410.13553
synergistic weak-strong collaboration by aligning preferences | arXiv: 2504.15188
Synergizing LLMs with Global Label Propagation for Multimodal Fake News Detection | arXiv: 2506.00488
synergizing unsupervised episode detection with llms for large-scale news events | arXiv: 2408.04873
syngraph a dynamic graph-llm synthesis framework for sparse streaming user senti | arXiv: 2503.04619
SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs | arXiv: 2506.05598
synthesizing post-training data for llms through multi-agent simulation | arXiv: 2410.14251
synthia novel concept design with affordance composition | arXiv: 2502.17793
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | arXiv: 2504.03561
systematic generalization in language models scales with information entropy | arXiv: 2505.13089
t-reg preference optimization with token-level reward regularization | arXiv: 2412.02685
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback | arXiv: 2505.10561
t2i-factualbench benchmarking the factuality of text-to-image models with knowle
t5score a methodology for automatically assessing the quality of llm generated m | arXiv: 2407.17390
table understanding and multimodal llms a cross-domain case study on scientific | arXiv: 2507.00152
Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | arXiv: 2502.11799
tabledreamer progressive and weakness-guided data synthesis from scratch for tab | arXiv: 2506.08646
TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models | arXiv: 2503.04396
tabxeval why this is a bad table an exhaustive rubric for table evaluation | arXiv: 2505.22176
taclr a scalable and efficient retrieval-based method for industrial product att | arXiv: 2501.03835
tada training-free recipe for decoding with adaptive kv cache compression and me | arXiv: 2506.04642
tag-evol achieving efficient instruction evolving via tag injection | arXiv: 2505.24165
tagrouter learning route to llms through tags for open-domain text generation ta | arXiv: 2506.12473
takin-vc expressive zero-shot voice conversion via adaptive hybrid content encod
taming language models for text-attributed graph learning with decoupled aggrega
taming llms with gradient grouping
targa targeted synthetic data generation for practical reasoning over structured | arXiv: 2412.19544
targeted syntactic evaluation for grammatical error correction
task-informed anti-curriculum by masking improves downstream performance on text | arXiv: 2502.12953
task-specific information decomposition for end-to-end dense video captioning
taxoadapt aligning llm-based multidimensional taxonomy construction to evolving | arXiv: 2506.10737
taz2024full analysing german newspapers for gender bias and discrimination acros | arXiv: 2506.05388
tc--rag turing--complete rags case study on medical llm systems
tcsinger 2 customizable multilingual zero-shot singing voice synthesis | arXiv: 2505.14910
teach a contrastive knowledge adaptive distillation framework for classical chin
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences | arXiv: 2506.00419
teaching text agents to learn sequential decision making from failure
Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions | arXiv: 2507.13773
team ack at semeval-2025 task 2 beyond word-for-word machine translation for eng | arXiv: 2504.20451
team anotheroption at semeval-2025 task 8 bridging the gap between open-source a | arXiv: 2506.09657
teamlora boosting low-rank adaptation with expert collaboration and competition | arXiv: 2408.09856
tell dont show leveraging language models abstractive retellings to model litera | arXiv: 2505.23166
tempest autonomous multi-turn jailbreaking of large language models with tree se | arXiv: 2503.10619
temporal reasoning for timeline summarisation in social media | arXiv: 2501.00152
temporal relation extraction in clinical texts a span-based graph transformer ap
terdy temporal relation dynamics through frequency decomposition for temporal kn
tess 2 a large-scale generalist diffusion language model | arXiv: 2502.13917
testnuc enhancing test-time computing approaches and scaling through neighboring | arXiv: 2502.19163
tetris optimal draft token selection for batch speculative decoding | arXiv: 2502.15197
texpert a multi-level benchmark for evaluating latex code generation by llms | arXiv: 2506.16990
text is all you need llm-enhanced incremental social event detection
text-to-es bench a comprehensive benchmark for converting natural language to el
textitl-citeeval a suite for evaluating fidelity of long-context models
that doesnt sound right evaluating speech transcription quality in field linguis
that is unacceptable the moral foundations of canceling
the ai gap how socioeconomic status affects language technology interactions | arXiv: 2505.12158
the alternative annotator test for llm-as-a-judge how to statistically justify r
the anatomy of evidence an investigation into explainable icd coding | arXiv: 2507.01802
the behavior gap evaluating zero-shot llm agents in complex task-oriented dialog | arXiv: 2506.12266
the cross-linguistic role of animacy in grammar structures
the distracting effect understanding irrelevant passages in rag | arXiv: 2505.06914
the efficiency vs accuracy trade-off optimizing rag-enhanced llm recommender sys
the esethu framework reimagining sustainable dataset governance and curation for | arXiv: 2502.15916
the essence of contextual understanding in theory of mind a study on question an
the harmonic structure of information contours | arXiv: 2506.03902
the hidden attention of mamba models | arXiv: 2403.01590
the hidden space of safety understanding preference-tuned llms in multilingual c | arXiv: 2504.02708
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It | arXiv: 2406.13181
The Impact of Token Granularity on the Predictive Power of Language Model Surprisal | arXiv: 2412.11940
The Impossibility of Fair LLMs | arXiv: 2406.03198
the invisible hand unveiling provider bias in large language models for code gen
the knowledge microscope features as better analytical lenses than neurons | arXiv: 2502.12483
the lawyer that never thinks consistency and fairness as keys to reliable ai
the male ceo and the female assistant evaluation and mitigation of gender biases
the mirage of model editing revisiting evaluation in the wild | arXiv: 2502.11177
the nature of nlp analyzing contributions in nlp papers | arXiv: 2409.19505
the noisy path from source to citation measuring how scholars engage with past r | arXiv: 2502.20581
the role of abstract representations and observed preferences in the ordering of
the role of deductive and inductive reasoning in large language models | arXiv: 2410.02892
the role of exploration modules in small language models for knowledge graph que | arXiv: 2509.07399
the role of visual modality in multimodal mathematical reasoning challenges and | arXiv: 2503.04167
the task shield enforcing task alignment to defend against indirect prompt injec
the time scale of redundancy between prosody and linguistic context | arXiv: 2503.11630
The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models | arXiv: 2410.16672
the ud-newscrawl treebank reflections and challenges from a large-scale tagalog
theme-explanation structure for table summarization using large language models | arXiv: 2501.10487
theorem prover as a judge for synthetic data generation | arXiv: 2502.13137
theorem-of-thought a multi-agent framework for abductive deductive and inductive | arXiv: 2506.07106
TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding | arXiv: 2502.19400
theoretical analysis of hierarchical language recognition and generation by tran
theoretical guarantees for minimum bayes risk decoding | arXiv: 2502.12685
Theory of Mind in Large Language Models: Assessment and Enhancement | arXiv: 2505.00026
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling | arXiv: 2412.14860
thinkguard deliberative slow thinking leads to cautious guardrails | arXiv: 2502.13458
thor-moe hierarchical task-guided and context-responsive routing for neural mach | arXiv: 2505.14173
tic-lm a web-scale benchmark for time-continual llm pretraining | arXiv: 2504.02107
tigerllm - a family of bangla large language models | arXiv: 2503.10995
time-mqa time series multi-task question answering with context enhancement | arXiv: 2503.01875
TIP of the Iceberg: Task-in-Prompt Adversarial Attacks on LLMs | arXiv: 2501.18626
to code or not to code adaptive tool integration for math language models via ex | arXiv: 2502.00691
TokAlign: Efficient Vocabulary Adaptation via Token Alignment | arXiv: 2506.03523
Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs | arXiv: 2412.11556
token pruning in multimodal large language models are we solving the right probl | arXiv: 2502.11501
tokenisation is np-complete | arXiv: 2412.15210
tokenization is sensitive to language variation | arXiv: 2502.15343
toolcoder a systematic code-empowered tool learning framework for large language | arXiv: 2502.11404
ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models | arXiv: 2502.11404
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use | arXiv: 2501.02506
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use | arXiv: 2501.02506
toolspectrum towards personalized tool utilization for large language models | arXiv: 2505.13176
top-nsigma eliminating noise in logit space for robust token sampling of llm
toward automatic discovery of a canine phonetic alphabet
toward structured knowledge reasoning contrastive retrieval-augmented generation | arXiv: 2506.00842
Towards a More Generalized Approach in Open Relation Extraction | arXiv: 2505.22801
towards a principled evaluation of knowledge editors | arXiv: 2507.05937
towards adaptive memory-based optimization for enhanced retrieval-augmented gene | arXiv: 2504.05312
towards better chain-of-thought a reflection on effectiveness and faithfulness | arXiv: 2405.18915
Towards Better Evaluation for Generated Patent Claims | arXiv: 2505.11095
towards better open-ended text generation a multicriteria evaluation framework | arXiv: 2410.18653
towards better value principles for large language model alignment a systematic
towards building large scale datasets and state-of-the-art automatic speech tran
towards comprehensive argument analysis in education dataset tasks and method | arXiv: 2505.12028
towards context-robust llms a gated representation fine-tuning approach | arXiv: 2502.14100
towards dynamic theory of mind evaluating llm adaptation to temporal evolution o | arXiv: 2505.17663
towards effective and efficient continual pre-training of large language models | arXiv: 2407.18743
towards effective extraction and evaluation of factual claims | arXiv: 2502.10855
towards enhanced immersion and agency for llm-based interactive drama | arXiv: 2502.17878
towards explainable temporal reasoning in large language models a structure-awar | arXiv: 2505.15245
towards fairness assessment of dutch hate speech detection | arXiv: 2506.12502
towards fully exploiting llm internal states to enhance knowledge boundary perce
Towards Geo-Culturally Grounded LLM Generations | arXiv: 2502.13497
towards global ai inclusivity a large-scale multilingual terminology dataset gis | arXiv: 2412.18367
towards harmonized uncertainty estimation for large language models | arXiv: 2505.19073
towards llm-powered attentive listener a pragmatic approach through quantity sel
towards multi-dimensional evaluation of llm summarization across domains and lan
towards objective fine-tuning how llms prior knowledge causes potential poor cal
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications | arXiv: 2501.02460
towards reliable large audio language model | arXiv: 2505.19294
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective | arXiv: 2505.23349
Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients | arXiv: 2410.22815
Towards Robust ESG Analysis Against Greenwashing Risks: A3CG | arXiv: 2502.15821
towards robust universal information extraction dataset evaluation and solution
towards safety reasoning in llms ai-agentic deliberation for policy-embedded cot | arXiv: 2505.21784
towards storage-efficient visual document retrieval an empirical study on reduci | arXiv: 2506.04997
towards style alignment in cross-cultural translation | arXiv: 2507.00216
towards text-image interleaved retrieval | arXiv: 2502.12799
Towards the Law of Capacity Gap in Distilling Language Models | arXiv: 2311.07052
tracing and dissecting how llms recall factual knowledge for real world question
tracking lifes ups and downs mining life events from social media posts for ment
TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning | arXiv: 2503.04381
training dynamics underlying language model scaling laws loss deceleration and z | arXiv: 2506.05447
training language model to critique for better refinement | arXiv: 2506.22157
training turn-by-turn verifiers for dialogue tutoring agents the curious case of | arXiv: 2502.13311
training-free llm merging for multi-task learning | arXiv: 2506.12379
Trans-PEFT: Transferable Parameter-Efficient Fine-Tuning on Evolving Base Models | arXiv: 2506.06844
trans-zero self-play incentivizes large language models for multilingual transla | arXiv: 2504.14669
transbench breaking barriers for transferable graphical user interface agents in | arXiv: 2505.17629
transferring textual preferences to vision-language understanding through model | arXiv: 2502.13487
transforming podcast preview generation from expert models to llm-based systems | arXiv: 2505.23908
translate with care addressing gender bias neutrality and reasoning in large lan | arXiv: 2506.00748
translation and fusion improves cross-lingual information extraction | arXiv: 2305.13582
trates trait-specific rubric-assisted cross-prompt essay scoring | arXiv: 2505.14577
tree-kg an expandable knowledge graph construction framework for knowledge-inten
tree-of-code a tree-structured exploring framework for end-to-end code generatio | arXiv: 2412.15305
tree-of-debate multi-persona debate trees elicit critical thinking for scientifi | arXiv: 2502.14767
Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models
treecut a synthetic unanswerable math word problem dataset for llm hallucination | arXiv: 2502.13442
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | arXiv: 2506.11902
tremu towards neuro-symbolic temporal reasoning for llm-agents with memory in mu | arXiv: 2502.01630
trident enhancing large language model safety with tri-dimensional diversified r
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs | arXiv: 2412.11242
tripcraft a benchmark for spatio-temporally fine grained travel planning | arXiv: 2502.20508
triplefact defending data contamination in the evaluation of llm-driven fake new
triptailor a real-world benchmark for personalized travel planning | arXiv: 2508.01432
TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification | arXiv: 2503.15289
truth knows no language evaluating truthfulness beyond english | arXiv: 2502.09387
tst a schema-based top-down and dynamic-aware agent of text-to-table tasks
tumlu a unified and native language understanding benchmark for turkic languages | arXiv: 2502.11020
Tuna: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos | arXiv: 2505.20124
tunable llm-based proactive recommendation agent
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling | arXiv: 2504.15754
twist text-encoder weight-editing for inserting secret trojans in text-to-image
two intermediate translations are better than one fine-tuning llms for document-
typed-rag type-aware decomposition of non-factoid questions for retrieval-augmen | arXiv: 2503.15879
typology-guided adaptation in multilingual models
ualign leveraging uncertainty estimations for factuality alignment on large lang | arXiv: 2412.11803
uaqfact evaluating factual knowledge utilization of llms on unanswerable questio | arXiv: 2505.23461
umedsum a unified framework for clinical abstractive summarization
un-considering contextual information assessing llms understanding of indexical | arXiv: 2506.01089
unanswerability evaluation for retrieval augmented generation | arXiv: 2412.12300
uncertainty in causality a new frontier
uncertainty propagation on llm agent
uncertainty unveiled can exposure to more in-context examples mitigate uncertain | arXiv: 2505.21003
uncertainty-aware iterative preference optimization for enhanced llm reasoning
uncovering the impact of chain-of-thought reasoning for direct preference optimi
Uncovering Visual-Semantic Psycholinguistic Properties from the Distributional Structure of Text Embedding Space | arXiv: 2505.23029
understanding and meeting practitioner needs when measuring representational har | arXiv: 2506.04482
understanding common ground misalignment in goal-oriented dialog a case-study wi | arXiv: 2503.12370
understanding cross-domain adaptation in low-resource topic modeling | arXiv: 2506.07453
Understanding Impact of Human Feedback via Influence Functions | arXiv: 2501.05790
understanding in-context machine translation for low-resource languages a case s | arXiv: 2502.11862
understanding large language model vulnerabilities to social bias attacks
understanding silent data corruption in llm training | arXiv: 2502.12340
understanding the dark side of llms intrinsic self-correction | arXiv: 2412.14959
understanding the repeat curse in large language models from a feature perspecti | arXiv: 2504.14218
uni-retrieval a multi-style retrieval framework for stems education | arXiv: 2502.05863
unicodec unified audio codec with single domain-adaptive codebook | arXiv: 2502.20067
UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations | arXiv: 2507.07030
Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes | arXiv: 2505.22165
unifying language agent algorithms with graph-based orchestration engine for rep | arXiv: 2505.24354
UniICL: An Efficient ICL Framework Unifying Compression, Selection, and Generation | arXiv: 2405.17062
unilr unleashing the power of llms on multiple legal tasks with a unified legal
unintended harms of value-aligned llms psychological and empirical insights | arXiv: 2506.06404
UniQuanF: Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models | arXiv: 2506.03781
unique hard attention a tale of two sides | arXiv: 2503.14615
unirag unified query understanding method for retrieval augmented generation
Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering | arXiv: 2503.11314
unlocking recursive thinking of llms alignment via refinement | arXiv: 2506.06009
unlocking speech instruction data potential with query rewriting | arXiv: 2507.08603
unmasking style sensitivity a causal analysis of bias evaluation instability in
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging | arXiv: 2505.22934
unraveling the mechanics of learning-based demonstration selection for in-contex
unravelling the logic investigating the generalisation of transformers in numeri
unseentimeqa time-sensitive question-answering beyond llms memorization | arXiv: 2407.03525
Unsolvable Problem Detection: Evaluating Trustworthiness of Large Multimodal Models | arXiv: 2403.20331
unsupervised morphological tree tokenizer | arXiv: 2406.15245
untie the knots an efficient data augmentation strategy for long-context pre-tra
unveil unified visual-textual integration and distillation for multi-modal docum
unveiling and addressing pseudo forgetting in large language models | arXiv: 2411.11932
unveiling attractor cycles in large language models a dynamical systems view of | arXiv: 2502.15208
unveiling cultural blind spots analyzing the limitations of mllms in procedural | arXiv: 2502.14315
unveiling dual quality in product reviews an nlp-based approach | arXiv: 2505.19254
unveiling environmental impacts of large language model serving a functional uni
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders | arXiv: 2505.05111
Unveiling Privacy Risks in LLM Agent Memory | arXiv: 2502.13172
unveiling the key factors for distilling chain-of-thought reasoning | arXiv: 2502.18001
unveiling the lack of lvlm robustness to fundamental visual variations why and p | arXiv: 2504.16727
unveiling the potential of bert-family a new recipe for building scalable genera
unveiling the power of source source-based minimum bayes risk decoding for neura | arXiv: 2406.11632
uora uniform orthogonal reinitialization adaptation in parameter efficient fine-
upcycling instruction tuning from dense to mixture-of-experts via parameter merg | arXiv: 2410.01610
urbanvideo-bench benchmarking vision-language models on embodied intelligence wi
usdc a dataset of underlineuser underlinestance and underlinedogmatism in long u | arXiv: 2406.16833
user-side model consistency monitoring for open source large language models inf
using information theory to characterize prosodic typology the case of tone pitc
using shapley interactions to understand how models use structure | arXiv: 2403.13106
using source-side confidence estimation for reliable translation into unfamiliar | arXiv: 2503.23305
using subtext to enhance generative idrr
utboost rigorous evaluation of coding agents on swe-bench | arXiv: 2506.09289
v-oracle making progressive reasoning in deciphering oracle bones for you and me
value portrait assessing language models values through psychometrically and eco | arXiv: 2505.01015
value residual learning | arXiv: 2410.17897
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition | arXiv: 2411.11479
vaquum are vague quantifiers grounded in visual data | arXiv: 2502.11874
velocitune a velocity-based dynamic domain reweighting method for continual pre- | arXiv: 2411.14318
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning | arXiv: 2505.16128
verbosity-aware rationale reduction effective reduction of redundant rationale v | arXiv: 2412.21006
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | arXiv: 2505.23693
vidcapbench a comprehensive benchmark of video captioning for controllable text- | arXiv: 2502.12782
videovista-culturallingo 360textdegree horizons-bridging cultures languages and
vigil3d a linguistically diverse dataset for 3d visual grounding | arXiv: 2501.01366
visa retrieval augmented generation with visual source attribution | arXiv: 2412.14457
vision-language models struggle to align entities across modalities | arXiv: 2503.03854
visual cues enhance predictive turn-taking for two-party human interaction | arXiv: 2505.21043
Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
visuothink empowering lvlm reasoning with multimodal tree search | arXiv: 2504.09130
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | arXiv: 2502.13775
vlm2-bench a closer look at how well vlms implicitly link explicit matching visu
VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service | arXiv: 2506.15755
vlsbench unveiling visual leakage in multimodal safety | arXiv: 2411.19939
vmlu benchmarks a comprehensive benchmark toolkit for vietnamese llms
voting or consensus decision-making in multi-agent debate | arXiv: 2502.19130
voxeval benchmarking the knowledge understanding capabilities of end-to-end spok | arXiv: 2501.04962
voxrag a step toward transcription-free rag systems in spoken question answering | arXiv: 2505.17326
vqaguider guiding multimodal large language models to answer complex video quest
VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | arXiv: 2506.08691
vulnerability of llms to vertically aligned text manipulations | arXiv: 2410.20016
waffle fine-tuning multi-modal model for automated front-end development
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options | arXiv: 2409.00113
walk in others shoes with a single glance human-centric visual grounding with to
wanda pruning large language models via regional gradients | arXiv: 2503.04992
warmup generations a task-agnostic approach for guiding sequence-to-sequence lea
warriorcoder learning from expert battles to augment code large language models | arXiv: 2412.17395
watching the watchers exposing gender disparities in machine translation quality | arXiv: 2410.10995
watermarking large language models an unbiased and low-risk method
wavrag audio-integrated retrieval augmented generation for spoken dialogue model | arXiv: 2502.14727
we-math does your large multimodal model achieve human-like mathematical reasoni
weaving context across images improving vision-language models through focus-cen
webwalker benchmarking llms in web traversal | arXiv: 2501.07572
weed out then harvest dual low-rank adaptation is an effective noisy label detec | arXiv: 2510.10208
well begun is half done low-resource preference alignment by weak-to-strong deco | arXiv: 2506.07434
WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermark | arXiv: 2409.04459
what are the essential factors in crafting effective long context multi-hop inst | arXiv: 2409.01893
what do you call a dog that is incontrovertibly true dogma testing llm generaliz
what happened in llms layers when trained for fast vs slow thinking a gradient p | arXiv: 2410.23743
what is stigma attributed to a theory-grounded expert-annotated interview corpus | arXiv: 2505.12727
What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations | arXiv: 2502.08279
What Makes a Good Natural Language Prompt? | arXiv: 2506.06950
what matters in evaluating book-length stories a systematic study of long story | arXiv: 2512.12839
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs | arXiv: 2505.19773
whats the difference supporting users in identifying the effects of prompt and m
when backdoors speak understanding llm backdoor attacks through model-generated | arXiv: 2411.12701
when claims evolve evaluating and enhancing the robustness of embedding models a | arXiv: 2503.03417
when gpt spills the tea comprehensive assessment of knowledge file leakage in gp
when harry meets superman the role of the interlocutor in persona-based dialogue | arXiv: 2505.24613
when large language models meet speech a survey on integration approaches | arXiv: 2502.19548
When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models | arXiv: 2502.13246
when should dense retrievers be updated in evolving corpora detecting out-of-dis | arXiv: 2506.01877
when the lm misunderstood the human chuckled analyzing garden path effects in hu
When to Speak, When to Abstain: Contrastive Decoding with Abstention | arXiv: 2412.12527
where are we evaluating llm performance on african languages | arXiv: 2502.19582
which demographics do llms default to during annotation | arXiv: 2410.08820
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above | arXiv: 2502.14127
which retain set matters for llm unlearning a case study on entity unlearning | arXiv: 2502.11441
whispa semantically and psychologically aligned whisper with self-supervised con | arXiv: 2501.16344
white men lead black women help benchmarking and mitigating language agency soci
who can withstand chat-audio attacks an evaluation benchmark for large audio-lan | arXiv: 2411.14842
who taught you that tracing teachers in model distillation | arXiv: 2502.06659
Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection | arXiv: 2502.12611
Whose Boat Does it Float? Improving Personalization in Preference Optimization | arXiv: 2501.11549
why are positional encodings nonessential for deep autoregressive transformers r | arXiv: 2501.00659
why not act on what you know unleashing safety potential of llms via self-aware | arXiv: 2505.12060
why prompt design matters and works a complexity analysis of prompt search space | arXiv: 2503.10084
why safeguarded ships run aground aligned large language models safety mechanism | arXiv: 2502.13946
wicked a simple method to make multiple choice benchmarks more challenging | arXiv: 2502.18316
wikimixqa a multimodal benchmark for question answering over tables and charts | arXiv: 2506.15594
winspot gui grounding benchmark with multimodal large language models
wirelessmathbench a mathematical modeling benchmark for llms in wireless communi | arXiv: 2505.14354
wizard of shopping target-oriented e-commerce dialogue generation with decision | arXiv: 2502.00969
words of warmth trust and sociability norms for over 26k english words | arXiv: 2506.03993
world modeling makes a better planner dual preference optimization for embodied | arXiv: 2503.10480
Writing Like the Best: Exemplar-Based Expository Text Generation | arXiv: 2505.18859
wximpactbench a disruptive weather impact understanding benchmark for evaluating | arXiv: 2505.20249
X-Turing: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents | arXiv: 2408.09853
x-webagentbench a multilingual interactive web benchmark for evaluating global a | arXiv: 2505.15372
xdac xai-driven detection and attribution of llm-generated news comments in kore
yes my lord guiding language model extraction with locality reinforced distillat
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering | arXiv: 2505.14279
you need to mimic to get fame solving meeting transcript scarcity with a multi-a | arXiv: 2502.13001
your model is overconfident and other lies we tell ourselves | arXiv: 2503.01235
yulan-mini pushing the limits of open data-efficient language model
zero-shot belief a hard problem for llms | arXiv: 2502.08777
zero-shot conversational stance detection dataset and approaches | arXiv: 2506.17693
zero-shot text-to-speech for vietnamese | arXiv: 2506.01322
zipa a family of efficient models for multilingual phone recognition | arXiv: 2505.23170
zjuklab at semeval-2025 task 4 unlearning via model merging | arXiv: 2503.21088
𝛿-stance a large-scale real world dataset of stances in legal argumentation
crafting privacy-preserving adversarial examples a defense against membership inf
an empirical study on detecting ai-generated text in financial reports
cognitive framework for detecting ai-generated fiction
haco-det-fine-grained-detection-under-human-ai-coauthoring | arXiv: 2506.02959
mcp-zero-shot-mgt-detection-via-conformal-prediction | arXiv: 2505.05084
atri-mitigating-multilingual-audio-text-retrieval-inconsistencies | arXiv: 2502.14627
audio token consistency | arXiv: 2409.19283
contextual biasing with the knowledgeable external language model for end-to-end
coa-reasoning explorations on counterfactual analysis in physical reasoning of l
counterfactual explanations for aspect-based sentiment analysis
benchmarking long-context language models on long code understanding | arXiv: 2503.04359
beyond sequences two-dimensional representation and dependency encoding for code
coco-bench a comprehensive code benchmark for multi-task large language model ev | arXiv: 2504.20673
codedpo code alignment | arXiv: 2410.05605
codeif benchmarking the instruction-following capabilities of large language mod | arXiv: 2502.19166
codereviewqa the code review comprehension assessment for large language models | arXiv: 2503.16167
compileagent automated real-world repo-level compilation with tool-integrated ll | arXiv: 2505.04254
coret improved retriever for code editing | arXiv: 2505.24715
dars dynamic action re-sampling to enhance coding agent performance by adaptive | arXiv: 2503.14269
dynacode a dynamic complexity-aware code benchmark for evaluating large language | arXiv: 2503.10452
etf an entity tracing framework for hallucination detection in code summaries | arXiv: 2410.14748
exploracoder advancing code generation for multiple unseen apis via planning and | arXiv: 2412.05366
feabench repo code gen | arXiv: 2503.06680
galla graph aligned large language models | arXiv: 2409.04183
gift gibbs fine tuning code gen | arXiv: 2502.11466
mldebugging towards benchmarking code debugging across multi-library scenarios | arXiv: 2506.13824
oasis order-augmented strategy for improved code search | arXiv: 2503.08161
personality guided code gen | arXiv: 2411.00006
program synthesis benchmark for visual programming in xlogoonline environment | arXiv: 2406.11334
reflectioncoder learning from reflection sequence for enhanced one-off code gene | arXiv: 2405.17057
rethinking repetition problems of llms in code generation | arXiv: 2505.10402
revisit self-debugging with self-generated tests for code generation
scenegenagent precise industrial scene generation with coding agent | arXiv: 2410.21909
texpert a multi-level benchmark for evaluating latex code generation by llms | arXiv: 2506.16990
tree-of-code a tree-structured exploring framework for end-to-end code generatio | arXiv: 2412.15305
tree of evolution code gen
utboost rigorous evaluation of coding agents on swe-bench | arXiv: 2506.09289
contradiction detection in rag-based chatbots
dialogue systems for emotional support via value reinforcement | arXiv: 2501.17182
enabling chatbots with eyes and ears an immersive multimodal conversation system | arXiv: 2506.00421
enhancing goal-oriented proactive dialogue systems via consistency reflection an | arXiv: 2506.13366
enstom enhancing dialogue systems with entropy-scaled steering vectors for topic | arXiv: 2505.16526
know you first and be you better modeling human-like user simulators via implici | arXiv: 2502.18968
know your mistakes towards preventing overreliance on task-oriented conversation | arXiv: 2501.10316
kokorochat a japanese psychological counseling dialogue | arXiv: 2506.01357
persona sentiment dialogue | arXiv: 2502.11423
personalens a benchmark for personalization evaluation in conversational ai assi | arXiv: 2506.09902
reflectdiffu empathetic response | arXiv: 2409.10289
single- vs dual-prompt dialogue generation with llms for job interviews in human | arXiv: 2502.18650
uniconv retrieval response gen | arXiv: 2507.07030
when harry meets superman the role of the interlocutor in persona-based dialogue | arXiv: 2505.24613
wizard of shopping target-oriented e-commerce dialogue generation with decision | arXiv: 2502.00969
agent steerable search for knowledge graph question answering
a reality check on context utilisation for retrieval-augmented generation | arXiv: 2412.17031
a text is worth several tokens text embedding from llms secretly aligns well wit | arXiv: 2406.17378
accelerating adaptive retrieval augmented generation via instruction-driven repr | arXiv: 2505.12731
air-bench automated heterogeneous information retrieval benchmark | arXiv: 2412.13102
any information is just worth one single screenshot unifying search with visuali | arXiv: 2502.11431
are llms effective psychological assessors leveraging adaptive rag for interpret | arXiv: 2501.00982
atomic llm a fine-grained information retrieval evaluation benchmark for languag
automatic benchmark generation from scientific papers via retrieval-augmented ll
beyond true or false retrieval-augmented hierarchical analysis of nuanced claims | arXiv: 2506.10728
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling | arXiv: 2406.17507
coir a comprehensive benchmark for code information retrieval models | arXiv: 2407.02883
collapse dense retrievers | arXiv: 2503.05037
comrag retrieval-augmented generation with dynamic vector stores for real-time c | arXiv: 2506.21098
core mmrag knowledge reconciliation | arXiv: 2506.02544
cross-lingual relevance transfer for document retrieval
divide then align rag knowledge boundary | arXiv: 2505.20871
dont reinvent the wheel efficient instruction-following text embedding based on | arXiv: 2505.24754
drag distilling rag slm | arXiv: 2506.01954
drama diverse augmentation from large language models to smaller dense retriever | arXiv: 2502.18460
empaths at semeval-2025 task 11 retrieval-augmented approach to perceived emotio | arXiv: 2506.04409
enhancing lexicon-based text embeddings with large language models | arXiv: 2501.09749
evaluation of attribution bias in generator-aware retrieval-augmented large lang | arXiv: 2410.12380
exit context-aware extractive compression for enhancing retrieval-augmented gene | arXiv: 2412.12559
faithfulrag fact level conflict | arXiv: 2506.08938
flashbackefficient retrieval-augmented language modeling for long context infere | arXiv: 2405.04065
flexrag a flexible and comprehensive framework for retrieval-augmented generatio | arXiv: 2506.12494
from ambiguity to accuracy the transformative effect of coreference resolution o | arXiv: 2507.07847
gainrag preference alignment | arXiv: 2505.18710
garage a benchmark with grounding annotations for rag evaluation | arXiv: 2506.07671
genie worksheets tod agent | arXiv: 2407.05674
gor rag long context summary | arXiv: 2410.11001
graf graph retrieval augmented by facts for romanian legal multi-choice question | arXiv: 2412.04119
gumbel reranking | arXiv: 2502.11116
health-llm personalized retrieval-augmented disease prediction system | arXiv: 2402.00746
helios harmonizing early fusion late fusion and llm reasoning for multi-granular | arXiv: 2603.02248
hierarchical document refinement for long-context retrieval-augmented generation | arXiv: 2505.10413
hoh a dynamic benchmark for evaluating the impact of outdated information on ret | arXiv: 2503.04800
hybgrag hybrid rag skb | arXiv: 2412.16311
hypothetical documents or knowledge leakage rethinking llm-based query expansion | arXiv: 2504.14175
investigating language preference of multilingual rag systems | arXiv: 2502.11175
investigating the robustness of retrieval-augmented generation at the query leve | arXiv: 2507.06956
knowshiftqa rag knowledge shifts | arXiv: 2412.08985
ldir low-dimensional dense and interpretable text embeddings with relative repre | arXiv: 2505.10354
llm psychological assessor | arXiv: 2501.00982
llm reranking harmful content | arXiv: 2501.13977
logical consistency is vital neural-symbolic information retrieval for negative- | arXiv: 2505.22299
main-rag multi-agent filtering retrieval-augmented generation | arXiv: 2501.00332
maximal matching matters preventing representation collapse for robust cross-mod | arXiv: 2506.21538
memerag a multilingual end-to-end meta-evaluation benchmark for retrieval augmen | arXiv: 2502.17163
mitigating lost-in-retrieval problems in retrieval augmented multi-hop question | arXiv: 2502.14245
moc mixtures of text chunking learners for retrieval-augmented generation system | arXiv: 2503.09600
mt-raig novel benchmark and evaluation framework for retrieval-augmented insight | arXiv: 2502.11735
multilingual retrieval augmented generation for culturally-sensitive tasks a ben | arXiv: 2410.01171
on synthetic data strategies for domain-specific generative retrieval | arXiv: 2502.17957
optimized text embedding models and benchmarks for amharic passage retrieval | arXiv: 2505.19356
pandora box rag noise | arXiv: 2408.13533
parenting optimizing knowledge selection of retrievalaugmented | arXiv: 2410.10360
prism political bias embeddings | arXiv: 2505.24646
psycholinguistic visual semantic | arXiv: 2505.23029
raemollm retrieval augmented llms for cross-domain misinformation detection usin | arXiv: 2406.11093
rageval scenario specific rag evaluation dataset generation framework | arXiv: 2408.01262
rare retrieval augmented reasoning | arXiv: 2412.02830
redundancy isotropy and intrinsic dimensionality of prompt-based text embeddings | arXiv: 2506.01435
refind at semeval-2025 task 3 retrieval-augmented factuality hallucination detec | arXiv: 2502.13622
removal of hallucination on hallucination debate-augmented rag | arXiv: 2505.18581
reranking-based generation for unbiased perspective summarization | arXiv: 2506.15925
saferag benchmarking security in retrieval-augmented generation of large languag | arXiv: 2501.18636
seakr self-aware knowledge retrieval for adaptive retrieval augmented generation | arXiv: 2406.19215
seal scaling to emphasize attention for long-context retrieval | arXiv: 2501.15225
semantic outlier removal with embedding models and llms | arXiv: 2506.16644
setr set selection rag | arXiv: 2507.06838
sgic a self-guided iterative calibration framework for rag | arXiv: 2506.16172
sticking to the mean detecting sticky tokens in text embedding models | arXiv: 2507.18171
the distracting effect understanding irrelevant passages in rag | arXiv: 2505.06914
toward structured knowledge reasoning contrastive retrieval-augmented generation | arXiv: 2506.00842
towards adaptive memory-based optimization for enhanced retrieval-augmented gene | arXiv: 2504.05312
towards storage-efficient visual document retrieval an empirical study on reduci | arXiv: 2506.04997
typed-rag type-aware decomposition of non-factoid questions for retrieval-augmen | arXiv: 2503.15879
unanswerability evaluation for retrieval augmented generation | arXiv: 2412.12300
unise-visualized-information-retrieval-with-screenshots | arXiv: 2502.11431
visa retrieval augmented generation with visual source attribution | arXiv: 2412.14457
voxrag a step toward transcription-free rag systems in spoken question answering | arXiv: 2505.17326
when claims evolve evaluating and enhancing the robustness of embedding models a | arXiv: 2503.03417
when should dense retrievers be updated in evolving corpora detecting out-of-dis | arXiv: 2506.01877
a dual-perspective nlg meta-evaluation framework with automatic benchmark and be
an empirical study of mechanistic interpretability approaches for factual recall
around the world in 24 hours probing llm knowledge of time and place | arXiv: 2506.03984
bias attribution in filipino language models extending a bias interpretability m | arXiv: 2506.07249
cleme2 gec evaluation | arXiv: 2407.00934
cracking factual knowledge a comprehensive analysis of degenerate knowledge neur | arXiv: 2402.13731
degenerate knowledge neurons | arXiv: 2402.13731
expert an explainable image captioning evaluation metric with structured explana | arXiv: 2506.24016
irt router multi llm | arXiv: 2506.01048
language agnostic concepts | arXiv: 2411.08745
llama see llama do entrainment | arXiv: 2505.09338
mechanistic interpretability of emotion inference in large language models | arXiv: 2502.05489
normalized aopc faithfulness metrics | arXiv: 2408.08137
output centric interpretability | arXiv: 2501.08319
position-aware automatic circuit discovery | arXiv: 2502.04577
probing subphonemes in morphology models | arXiv: 2505.11297
probing the geometry of truth consistency and generalization of truth directions | arXiv: 2506.00823
reasoning circuits in language models a mechanistic interpretation of syllogisti | arXiv: 2408.08590
retrieve to explain drug target identification | arXiv: 2402.04068
safety is not only about refusal reasoning-enhanced fine-tuning for interpretabl | arXiv: 2503.05021
separating tongue from thought activation patching reveals language-agnostic con | arXiv: 2411.08745
shortcut neuron eval | arXiv: 2506.04142
the anatomy of evidence an investigation into explainable icd coding | arXiv: 2507.01802
towards explainable temporal reasoning in large language models a structure-awar | arXiv: 2505.15245
a general knowledge injection framework for icd coding | arXiv: 2505.18708
adaptive detoxification safeguarding general capabilities of llms through toxici | arXiv: 2505.22298
bmike-53 investigating cross-lingual knowledge editing with in-context learning | arXiv: 2406.17764
chainedit propagating ripple effects in llm | arXiv: 2507.08427
cknowedit chinese knowledge editing dataset llms | arXiv: 2409.05806
compke complex question answering under knowledge editing | arXiv: 2506.00829
context-robust knowledge editing for language models | arXiv: 2505.23026
docmedit towards document-level model editing | arXiv: 2505.19572
efficient knowledge editing | arXiv: 2506.04226
megen generative backdoor into large language models via model editing | arXiv: 2408.10722
memorizing is not enough deep knowledge injection through reasoning | arXiv: 2504.00472
mitigating negative interference in multilingual sequential knowledge editing th | arXiv: 2506.10800
neuron-level sequential editing for large language models | arXiv: 2410.04045
revealing the deceptiveness of knowledge editing a mechanistic analysis of super | arXiv: 2505.12636
sake steering activations for knowledge editing | arXiv: 2503.01751
scedit script-based assessment of knowledge editing | arXiv: 2505.23291
structure-aware domain knowledge injection for large language models | arXiv: 2407.16724
the mirage of model editing revisiting evaluation in the wild | arXiv: 2502.11177
towards a principled evaluation of knowledge editors | arXiv: 2507.05937
agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals | arXiv: 2502.19328
an empirical study on llm-based agents for automated bug fixing | arXiv: 2411.10213
bookworld from novels to interactive agent societies for story creation | arXiv: 2504.14538
crowdcounter llm-agent-based scalable framework for web information gathering
repro-bench can agentic ai systems assess the reproducibility of research claims
aligning to what limits to rlhf based alignment | arXiv: 2503.09025
constitutional classifiers defending against universal jailbreaks across thousan | arXiv: 2501.18837
intuitive fine tuning simplifying alignment into single process | arXiv: 2405.11870
accelerating speculative decoding via efficient context-aware draft generation
consistency-preserving contrastive decoding for faithful document-grounded dial
coprus consistency preserving utterance synthesis towards more realistic benchma
fuel-unveiling-environmental-impacts-of-llm-serving | arXiv: 2502.11256
fuel unveiling environmental impacts of llm serving | arXiv: 2502.11256
a conformal risk control framework for granular word assessment and uncertainty | arXiv: 2504.01225
a mismatched benchmark for scientific natural language inference | arXiv: 2506.04603
abgen evaluating large language models in | arXiv: 2507.13300
access denied inc the first benchmark environment for sensitivity awareness | arXiv: 2506.00964
ad-hoc concept forming in the game codenames as a means for evaluating large lan | arXiv: 2502.11707
ad-llm benchmarking large language models for anomaly detection | arXiv: 2412.11142
androidlab autonomous agent | arXiv: 2410.24024
antileakbench preventing data contamination by automatically constructing benchm | arXiv: 2412.13670
atomic calibration of llms in long-form generations | arXiv: 2410.13246
batayan a filipino nlp benchmark for evaluating large language models | arXiv: 2502.14911
belarusian glue
benchmarking llms and llm-based agents in practical vulnerability detection for | arXiv: 2503.03586
benchmarking uncertainty quantification methods for large language models with l | arXiv: 2406.15627
besstie a benchmark for sentiment and sarcasm classification for varieties of en | arXiv: 2412.04726
beyond one-size-fits-all tailored benchmarks for efficient evaluation | arXiv: 2502.13576
browsing lost unformed recollections a benchmark for tip-of-the-tongue search an | arXiv: 2503.19193
calibraeval calibrating prediction distribution to mitigate selection bias in ll | arXiv: 2410.15393
calibration confidence text gen | arXiv: 2506.00637
can external validation tools improve annotation quality for llm-as-a-judge | arXiv: 2507.17015
cfbench a comprehensive constraints-following benchmark for llms | arXiv: 2408.01122
chatbench from static benchmarks to human-ai evaluation | arXiv: 2504.07114
codemenv benchmarking large language models on code migration | arXiv: 2506.00894
com2 causal commonsense | arXiv: 2506.07064
cov-eval-code-security-evaluation-benchmark | arXiv: 2505.10494
cov eval evaluating llms from code security perspective | arXiv: 2505.10494
culemo cultural lenses on emotion - benchmarking llms for cross-cultural emotion | arXiv: 2503.10688
culturalbench a robust diverse and challenging cultural benchmark by human-ai cu | arXiv: 2410.02677
ecomscriptbench | arXiv: 2505.15196
editinspector a benchmark for evaluation of text-guided image edits | arXiv: 2506.09988
educationq evaluating llms teaching capabilities through multi-agent dialogue fr | arXiv: 2504.14928
elaboration competitive programming | arXiv: 2505.16667
evowiki evaluating llms on evolving knowledge | arXiv: 2412.13582
exposing numeracy gaps a benchmark to evaluate fundamental numerical abilities i | arXiv: 2502.11075
financereasoning benchmarking financial numerical reasoning more | arXiv: 2506.05828
from tools to teammates evaluating llms in multi-session coding interactions | arXiv: 2502.13791
grace a granular benchmark for evaluating model calibration against human calibr | arXiv: 2502.19684
guessarena guess who i am a | arXiv: 2505.22661
hallulens llm hallucination benchmark | arXiv: 2504.17550
hellaswag-pro a large-scale bilingual benchmark for evaluating the robustness of | arXiv: 2502.11393
help write story feedback | arXiv: 2507.16007
homebench evaluating llms in smart homes with valid and invalid instructions acr | arXiv: 2505.19628
how far are llms from being our digital twins a benchmark for persona-based beha | arXiv: 2502.14642
hpss heuristic prompting strategy search for llm evaluators | arXiv: 2502.13031
influences on llm calibration a study of response agreement loss functions and p | arXiv: 2501.03991
justrank llm judge system ranking | arXiv: 2412.09569
kitab-bench a comprehensive multi-domain benchmark for arabic ocr and document u | arXiv: 2502.14949
kristeva close reading as a novel task for benchmarking interpretive reasoning | arXiv: 2505.09825
la leaderboard spanish | arXiv: 2507.00999
language complexity measurement as a noisy zero-shot proxy for evaluating llm pe | arXiv: 2502.11578
language model probabilities are not calibrated in numeric contexts | arXiv: 2410.16007
mars benchmarking the metaphysical reasoning abilities of language models with a | arXiv: 2406.02106
mcbe a multi-task chinese bias evaluation benchmark for large language models | arXiv: 2507.02088
mdbench a synthetic multi-document reasoning benchmark generated with knowledge | arXiv: 2506.14927
mis-prompt benchmarking large language models for proactive error handling | arXiv: 2506.00064
mmlu-cf a contamination-free multi-task language understanding benchmark | arXiv: 2412.15194
movie101v2 improved movie narration benchmark | arXiv: 2404.13370
navigating rifts in human-llm grounding study and benchmark | arXiv: 2503.13975
noreval a norwegian language understanding and generation evaluation benchmark | arXiv: 2504.07749
onebench to test them all sample-level benchmarking over open-ended capabilities | arXiv: 2412.06745
pap2pat benchmarking outline-guided long-text patent generation with patent-pape | arXiv: 2410.07009
papersplease a benchmark for evaluating motivational values of large language mo | arXiv: 2506.21961
patch psychometrics-assisted benchmarking of large language models against human | arXiv: 2404.01799
physreason a comprehensive benchmark towards physics-based reasoning | arXiv: 2502.12054
readoc a unified benchmark for realistic document structured extraction | arXiv: 2409.05137
realhitbench a comprehensive realistic hierarchical table benchmark for evaluati | arXiv: 2506.13405
retrieval models arent tool-savvy benchmarking tool retrieval for large language | arXiv: 2503.01763
revisiting 3d llm benchmarks are we really testing 3d capabilities | arXiv: 2502.08503
right answer wrong score uncovering the inconsistencies of llm evaluation in mul | arXiv: 2503.14996
rulearena rule guided reasoning | arXiv: 2412.08972
sanskriti a comprehensive benchmark for evaluating language models knowledge of | arXiv: 2506.15355
seedbench a multi-task benchmark for evaluating large language models in seed sc | arXiv: 2505.13220
sklep a slovak general language understanding benchmark | arXiv: 2506.21508
somethings fishy in the data lake a critical re-evaluation of table union search | arXiv: 2505.21329
structext eval | arXiv: 2406.10621
structflowbench a structured flow benchmark for multi-turn instruction following | arXiv: 2502.14494
swiltra-bench the swiss legal translation benchmark | arXiv: 2503.01372
tic-lm a web-scale benchmark for time-continual llm pretraining | arXiv: 2504.02107
towards dynamic theory of mind evaluating llm adaptation to temporal evolution o | arXiv: 2505.17663
towards objective fine-tuning how llms prior knowledge causes potential poor cal | arXiv: 2505.20903
tripcraft a benchmark for spatio-temporally fine grained travel planning | arXiv: 2502.20508
triptailor a real-world benchmark for personalized travel planning | arXiv: 2508.01432
tumlu a unified and native language understanding benchmark for turkic languages | arXiv: 2502.11020
vital pluralistic alignment healthcare | arXiv: 2502.13775
voxeval benchmarking the knowledge understanding capabilities of end-to-end spok | arXiv: 2501.04962
webwalker benchmarking llms in web traversal | arXiv: 2501.07572
where are we evaluating llm performance on african languages | arXiv: 2502.19582
wicked a simple method to make multiple choice benchmarks more challenging | arXiv: 2502.18316
wximpactbench a disruptive weather impact understanding benchmark for evaluating | arXiv: 2505.20249
yescieval llm judge science | arXiv: 2505.14279
agentdropout-dynamic-agent-elimination-for-multi-agent-collaboration | arXiv: 2503.18891
ai as a novel ethical agent exploring moral judgments by large language models
an empirical study of large language models for automated review generation
analyzing the rapid generalization of sft via the perspective of attention head | arXiv: 2409.15820
argument mining in the age of large language models
arm alignment retrieval | arXiv: 2501.18539
assessing and enhancing the causal reasoning abilities of language models via fai
assessing the vulnerability of llms to cognitive biases in scientific research
autoexp automatic experiment design and execution by llms
beyond dialogue roleplay | arXiv: 2408.10903
bfs-prover-scalable-best-first-tree-search-for-llm-based-automatic-theorem-proving | arXiv: 2502.03438
can llms interpret leverage amrs | arXiv: 2504.04745
catching shortcuts a framework for evaluating shortcuts in large language models
cheaper and better diffusion language model via task-specific training
clue guided re-assessment to improve reasoning in large language models
collaborative performance prediction for large language models | arXiv: 2407.01300
comparing large language models in extracting subjective information from politi
comparing linguistic acceptability judgments of autoregressive language models
concreteness versus abstractness a selectivity analysis in llms
cross-modal alignment for llm-enhanced spoken language understanding
epistemic-markers-in-confidence-estimation | arXiv: 2505.24778
limitgen-llms-identify-research-limitations | arXiv: 2507.02694
llm mapreduce simplified long sequence processing | arXiv: 2410.09342
llms-comprehend-temporal-meaning-in-narratives | arXiv: 2507.14307
neuronxa-cross-lingual-alignment-via-neurons | arXiv: 2507.14900
rethinking-sorting-in-llm-pairwise-ranking | arXiv: 2505.24643
rhio retrieval heads faithfulness | arXiv: 2501.13573
seed stepwise reasoning disruption attack | arXiv: 2412.11934
sft attention activation | arXiv: 2409.15820
toolcoder code empowered tool learning | arXiv: 2502.11404
activation-inversion-attack-stealing-training-data-in-decentralized-training | arXiv: 2502.16086
adversarial tokenization | arXiv: 2503.02174
asynclm efficient and adaptive async pre-training of language models
autonomous data selection with zero-shot generative classifiers for mathematical | arXiv: 2402.07625
between circuits chomsky | arXiv: 2502.19249
chinese grammatical error correction with pre-trained models and linguistic clue
critiq mining data quality criteria from human preferences | arXiv: 2502.19279
data-constrained synthesis of training data for de-identification | arXiv: 2502.14677
data caricatures on the representation of african american language in pretraini | arXiv: 2503.10789
data whisperer data selection | arXiv: 2505.12212
davir data selection via implicit reward for large language models | arXiv: 2310.13008
diversity explains inference scaling laws through a case study of minimum bayes | arXiv: 2410.15021
dual stage curriculum learning sequence labeling | arXiv: 2402.13534
emergent abilities continued pt | arXiv: 2506.00288
fr spec speculative sampling | arXiv: 2502.14856
how do llms acquire new knowledge a knowledge circuits perspective on continual | arXiv: 2502.11196
improving continual pre-training through seamless data packing | arXiv: 2505.22018
inconsistent tokenizations cause language models to be perplexed by japanese gra | arXiv: 2505.19599
incorporating domain knowledge into materials tokenization | arXiv: 2506.11115
inserter speech instruction | arXiv: 2503.02769
large vocabulary size improves large language models | arXiv: 2406.16508
leancode understanding models better for code simplification of pre-trained larg | arXiv: 2505.14759
making llms better many-to-many speech-to-text translators with curriculum learn | arXiv: 2409.19510
metarater a multidimensional data selection method | arXiv: 2504.14194
model performance-guided evaluation data selection for effective prompt optimiza | arXiv: 2505.10736
nemotron cc pretraining data | arXiv: 2412.02595
optimizing pre-training data mixtures with mixtures of data expert models | arXiv: 2502.15950
pre-training curriculum for multi-token prediction in language models | arXiv: 2505.22757
retrofitting large language models with dynamic tokenization | arXiv: 2411.18553
scar style consistency data selection | arXiv: 2406.10882
second language arabic acquisition of llms via progressive vocabulary expansion | arXiv: 2412.12310
splintering nonconcatenative languages for better tokenization | arXiv: 2503.14433
stealing training data from large language models in decentralized training thro | arXiv: 2502.16086
synthesizing post-training data for llms through multi-agent simulation | arXiv: 2410.14251
tokalign vocab adaptation | arXiv: 2506.03523
tokenization is sensitive to language variation | arXiv: 2502.15343
towards effective and efficient continual pre-training of large language models | arXiv: 2407.18743
training dynamics underlying language model scaling laws loss deceleration and z | arXiv: 2506.05447
unsupervised morphological tree tokenizer | arXiv: 2406.15245
velocitune a velocity-based dynamic domain reweighting method for continual pre- | arXiv: 2411.14318
beyond the answer advancing multi-hop qa with fine-grained graph reasoning and e
commonsense abductive reasoning using knowledge from multiple sources
complex reasoning with natural language contexts and background knowledge
epicprm-efficient-precise-training-data-for-process-reward-model | arXiv: 2503.02382
agrail a lifelong agent guardrail with effective and adaptive safety detection | arXiv: 2502.11448
aligning large language models to follow instructions and hallucinate less via e | arXiv: 2502.07340
alleviating hallucinations from knowledge misalignment in large language models
answer when needed forget when not language models pretend to forget via in-cont | arXiv: 2410.00382
are the hidden states hiding something testing the limits of factuality-encoding | arXiv: 2505.16520
arghitz at archehr-qa 2025 a two-step divide and conquer approach to patient que | arXiv: 2506.12886
automated explanation generation and hallucination detection for heritage image
chinese simpleqa a chinese factuality evaluation for large language models | arXiv: 2411.07140
cliperase efficient unlearning of visual-textual associations in clip | arXiv: 2410.23330
comparisonqa evaluating factuality robustness of llms through knowledge frequenc | arXiv: 2412.20251
core robust factual precision with informative sub-claim identification
defense prompt injection | arXiv: 2411.00459
exploring forgetting in large language model pre-training | arXiv: 2410.17018
factual knowledge in language models robustness and anomalies under simple tempo | arXiv: 2502.01220
faithful and robust llm-driven theorem proving for nli explanations | arXiv: 2505.24264
from misleading queries to accurate answers a three-stage fine-tuning method for | arXiv: 2504.11277
hallucination detox send | arXiv: 2410.15460
halogen hallucinations
hd-ndes neural differential equations for hallucination detection in llms | arXiv: 2506.00088
hidden-states-factuality-encoding-limits | arXiv: 2505.16520
how does response length affect long-form factuality | arXiv: 2505.23295
improving factuality with explicit working memory | arXiv: 2412.18069
improving model factuality with fine-grained critique-based evaluator | arXiv: 2410.18359
indirect prompt injection detection | arXiv: 2502.16580
intent hallucination eval | arXiv: 2506.06539
language models can subtly deceive without lying a case study on strategic phras | arXiv: 2405.04325
learning auxiliary tasks improves reference-free hallucination detection in open | arXiv: 2505.12265
localizing and mitigating errors in long-form question answering | arXiv: 2407.11930
mamba knockout for unraveling factual information flow | arXiv: 2505.24244
monitoring decoding mitigating hallucination via evaluating the factuality of pa | arXiv: 2503.03106
odysseus dynamic focus decoding | arXiv: 2503.08057
on-policy self-alignment with fine-grained knowledge feedback for hallucination | arXiv: 2406.12221
opt-out investigating entity-level unlearning for large language models via opti | arXiv: 2406.12329
rate-ft-auxiliary-tasks-for-hallucination-detection | arXiv: 2505.12265
real-time factuality assessment from adversarial feedback | arXiv: 2410.14651
relearn unlearning via learning for large language models | arXiv: 2502.11190
revs unlearning sensitive information in language models via rank editing in the | arXiv: 2406.09325
saferoute adaptive model selection for efficient and accurate safety guardrails | arXiv: 2502.12464
seuf is unlearning one expert enough for mixture-of-experts llms | arXiv: 2411.18797
stochastic chameleons irrelevant context hallucinations reveal class-based misge | arXiv: 2505.22630
towards context-robust llms a gated representation fine-tuning approach | arXiv: 2502.14100
towards effective extraction and evaluation of factual claims | arXiv: 2502.10855
treecut a synthetic unanswerable math word problem dataset for llm hallucination | arXiv: 2502.13442
truth knows no language evaluating truthfulness beyond english | arXiv: 2502.09387
ualign leveraging uncertainty estimations for factuality alignment on large lang | arXiv: 2412.11803
uaqfact evaluating factual knowledge utilization of llms on unanswerable questio | arXiv: 2505.23461
unveiling and addressing pseudo forgetting in large language models | arXiv: 2411.11932
which retain set matters for llm unlearning a case study on entity unlearning | arXiv: 2502.11441
zjuklab at semeval-2025 task 4 unlearning via model merging | arXiv: 2503.21088
align-pro align protein representations through multi-modal learning
concept bottleneck language models for protein design
medbiorag semantic search and retrieval-augmented generation for biomedical lite
cfsp an efficient structured pruning framework for llms with coarse-to-fine acti
compact and compressible representations for llms using structured sparse decom
compression in transformer language models has a surprising relationship with pe
a case study of cross-lingual zero-shot generalization for classical languages i | arXiv: 2505.13173
accessible machine translation evaluation for low-resource languages
alleviating distribution shift in synthetic data for machine translation quality | arXiv: 2502.19941
an expanded massive multilingual dataset for high-performance language technolog | arXiv: 2503.10267
are rules meant to be broken understanding multilingual moral reasoning as a com | arXiv: 2502.14083
askqe question answering as automatic evaluation for machine translation | arXiv: 2504.11582
assessing agentic large language models in multilingual national bias | arXiv: 2502.17945
beyond n-grams rethinking evaluation metrics and strategies for multilingual abs | arXiv: 2507.08342
blessing of multilinguality a systematic analysis of multilingual in-context lea | arXiv: 2502.11364
bridging the language gaps in large language models with inference-time cross-li | arXiv: 2410.12462
cc-tuning a cross-lingual connection mechanism for improving joint multilingual | arXiv: 2506.00875
cchall a novel benchmark for joint cross-lingual and cross-modal hallucinations | arXiv: 2505.19108
clix cross-lingual explanations of idiomatic expressions | arXiv: 2501.03191
code-switching curriculum learning for multilingual transfer in llms | arXiv: 2411.02460
code-switching red-teaming llm evaluation for safety and multilingual understand | arXiv: 2406.15481
comparative analysis of multilingual hate speech detection
context augmented token-level post-editing for human interpreting
cosmmic commentsensitive multimodal multilingual indian corpus | arXiv: 2506.15372
cross-lingual auto evaluation for assessing multilingual llms | arXiv: 2410.13394
cross-lingual optimization for language transfer in large language models | arXiv: 2505.14297
cross-lingual representation alignment through contrastive image-caption tuning | arXiv: 2505.13628
cross-lingual transfer of cultural knowledge an asymmetric phenomenon | arXiv: 2506.01675
cross-lingual transfer of debiasing and detoxification in multilingual llms an e | arXiv: 2412.14050
cross lingual neurons compression | arXiv: 2506.01629
crosslingual pitfalls | arXiv: 2505.18673
cruxeval-x a benchmark for multilingual code reasoning understanding and executi | arXiv: 2408.13001
culfit a fine-grained cultural-aware llm training paradigm via multilingual crit | arXiv: 2505.19484
dictionaries to the rescue cross-lingual vocabulary transfer for low-resource la | arXiv: 2506.01535
disentangle language culture | arXiv: 2505.24635
edit once update everywhere a simple framework for cross-lingual knowledge synch | arXiv: 2502.14645
execute a multilingual benchmark for llm token understanding | arXiv: 2505.17784
exploring in-context example generation for machine translation | arXiv: 2506.00507
exploring in-image machine translation with real-world background | arXiv: 2505.15282
flare crosslingual lora | arXiv: 2501.06892
grammamt improving machine translation with grammar-informed in-context learning | arXiv: 2410.18702
group then scale dynamic mixture-of-experts multilingual language model | arXiv: 2506.12388
hierarchical news clustering | arXiv: 2506.00277
implicit cross-lingual rewarding for efficient multilingual preference alignment | arXiv: 2503.04647
improving mllms document image machine translation via synchronously self-review | arXiv: 2507.08309
just go parallel improving the multilingual capabilities of large language model | arXiv: 2506.13044
knowcoder-x boosting multilingual information extraction via code | arXiv: 2411.04794
laca crosslingual absa | arXiv: 2508.09515
langmark a multilingual dataset for automatic post-editing | arXiv: 2511.17153
langsamp multilingual pretraining | arXiv: 2409.18199
lemonade a large multilingual expert-annotated abstractive event dataset for the | arXiv: 2506.00980
less but better efficient multilingual expansion | arXiv: 2505.22582
lexgen domain-aware multilingual lexicon generation | arXiv: 2405.11200
llms can achieve high-quality simultaneous machine translation as efficiently as | arXiv: 2504.09570
lost in multilinguality dissecting cross-lingual factual inconsistency in transf | arXiv: 2504.04264
low resource translation | arXiv: 2506.01796
m-mad multidimensional multi-agent debate for advanced machine translation evalu | arXiv: 2412.20127
m2rc-eval massively multilingual repository-level code completion evaluation | arXiv: 2410.21157
m3finmeeting a multilingual multi-sector and multi-task financial meeting unders | arXiv: 2506.02510
m rewardbench | arXiv: 2410.15522
machine translation models are zero-shot detectors of translation direction | arXiv: 2401.06769
marco bench multilingual if | arXiv: 2507.11882
maxife multilingual and cross-lingual instruction following evaluation | arXiv: 2506.01776
memorization inheritance seqkd | arXiv: 2502.01491
mid layer crosslingual alignment | arXiv: 2502.14830
milic-eval benchmarking multilingual llms for chinas minority languages | arXiv: 2503.01150
modular sentence encoders | arXiv: 2407.14878
moscar a large-scale multilingual and multimodal document-level corpus | arXiv: 2406.08707
msqad multilingual ethical bias | arXiv: 2505.19121
mt eval human parity | arXiv: 2506.19571
mtvqa benchmarking multilingual text-centric visual question answering | arXiv: 2405.11985
multi-perspective alignment for increasing naturalness in neural machine transla | arXiv: 2412.08473
multilingual encoder knows more than you realize shared weights pretraining for | arXiv: 2502.10852
multilingual llm english accent | arXiv: 2410.15956
multilingual speech data quality | arXiv: 2506.17525
nametag 3 a tool and a service for multilingualmultitagset ner | arXiv: 2506.05949
probing llms for multilingual discourse generalization through a unified label s | arXiv: 2503.10515
registering source tokens to target language spaces in multilingual neural machi | arXiv: 2501.02979
semantic aware linear transfer by recycling pre-trained language models for cros | arXiv: 2505.10945
seqpo-simt sequential policy optimization for simultaneous machine translation | arXiv: 2505.20622
shifcon nondominant language | arXiv: 2410.19453
sift-50m a large-scale multilingual dataset for speech instruction fine-tuning | arXiv: 2504.09081
statement-tuning enables efficient cross-lingual generalization in encoder-only | arXiv: 2506.01592
team ack at semeval-2025 task 2 beyond word-for-word machine translation for eng | arXiv: 2504.20451
the esethu framework reimagining sustainable dataset governance and curation for | arXiv: 2502.15916
the hidden space of safety understanding preference-tuned llms in multilingual c | arXiv: 2504.02708
thor-moe hierarchical task-guided and context-responsive routing for neural mach | arXiv: 2505.14173
towards global ai inclusivity a large-scale multilingual terminology dataset gis | arXiv: 2412.18367
trans-zero self-play incentivizes large language models for multilingual transla | arXiv: 2504.14669
translation and fusion improves cross-lingual information extraction | arXiv: 2305.13582
translation robustness | arXiv: 2403.03923
understanding in-context machine translation for low-resource languages a case s | arXiv: 2502.11862
unveiling the power of source source-based minimum bayes risk decoding for neura | arXiv: 2406.11632
watching the watchers exposing gender disparities in machine translation quality | arXiv: 2410.10995
x-webagentbench a multilingual interactive web benchmark for evaluating global a | arXiv: 2505.15372
zipa a family of efficient models for multilingual phone recognition | arXiv: 2505.23170
answering complex geographic questions by adaptive reasoning with visual context
chart-based reasoning transferring capabilities from llms to vlms | arXiv: 2403.12596
cordial-multimodal-llm-coherence-relationships | arXiv: 2502.11300
mmboundary reasoning step confidence | arXiv: 2505.23224
visc-focus-centric-visual-chains-for-multi-image-reasoning | arXiv: 2504.20199
vlm2-bench-visual-cue-linking | arXiv: 2502.12084
wemath knowledge reasoning | arXiv: 2407.01284
abstractive snippet generation
an empirical study of iterative refinements for non-autoregressive translation
controlling politeness in multi-turn dialogues through pre-phrase augmentation
active llms for multi-hop question answering
attribution methods in nlp navigating a fragmented landscape
bilingual zero-shot stance detection
brighter bridging the gap in human-annotated textual emotion recognition dataset
conversational quality assessment a large-scale corpus and comprehensive study
deja vu decoding repeated reading from eye movements | arXiv: 2502.11061
meaning-beyond-truth-conditions-anaphora-accessibility | arXiv: 2502.14119
variational approach mitigating entity bias relation extraction | arXiv: 2506.11381
achieving certification-by-design through model-driven development
adaptive feature-based low rank plus sparse decomposition for subspace clusterin
cooperating and competing through natural language
sightation-blv-aligned-diagram-descriptions | arXiv: 2503.13369
a survey on proactive defense strategies against misinformation in large languag | arXiv: 2507.05288
banstereoset a dataset to measure stereotypical social biases in llms for bangla | arXiv: 2409.11638
beyond negative stereotypes -- non-negative abusive utterances about identity gr
biasguard a reasoning-enhanced bias detection tool for large language models | arXiv: 2504.21299
can community notes replace professional fact-checkers | arXiv: 2502.14132
conspiracy theories and where to find them on tiktok | arXiv: 2407.12545
culture matters in toxic language detection in persian | arXiv: 2506.03458
detection of human and machine-authored fake news in urdu | arXiv: 2410.19517
explicit vs implicit investigating social bias in large language models through | arXiv: 2501.02295
exploring gender bias in large language models an in-depth dive into the german | arXiv: 2507.16557
exploring multimodal challenges in toxic chinese detection taxonomy benchmark an | arXiv: 2505.24341
exploring the impact of instruction-tuning on llms susceptibility to misinformat | arXiv: 2507.18203
fairsteer inference time debiasing for llms with dynamic activation steering | arXiv: 2504.14492
gg-bbq german gender bias benchmark for question answering | arXiv: 2507.16410
hateday global hate speech | arXiv: 2411.15462
how does misinformation affect large language | arXiv: 2505.21608
implihatevid video hate | arXiv: 2508.06570
is llm an overconfident judge unveiling the capabilities of llms in detecting of | arXiv: 2502.06207
kda automated data generation pipeline for detoxifying implicitly offensive lang | arXiv: 2506.13513
llm label propagation | arXiv: 2506.00488
llm personalized disinformation | arXiv: 2412.13666
mdit-bench evaluating the dual-implicit toxicity in large multimodal models | arXiv: 2505.17144
measuring social biases in masked language models by proxy of prediction quality | arXiv: 2402.13954
silencing empowerment allowing bigotry auditing the moderation of hate speech on | arXiv: 2506.07667
state toxicn a benchmark for span-level target-aware toxicity extraction in chin | arXiv: 2501.15451
taz2024full analysing german newspapers for gender bias and discrimination acros | arXiv: 2506.05388
translate with care addressing gender bias neutrality and reasoning in large lan | arXiv: 2506.00748
context aware sentiment forecasting agents | arXiv: 2505.24331
q2e query-to-event decomposition for zero-shot multilingual text-to-video retrie | arXiv: 2506.10202
vidcapbench a comprehensive benchmark of video captioning for controllable text- | arXiv: 2502.12782
a thousand words paint a picture multimodal goal tracking for grounded social in
attention-seeker dynamic self-attention scoring for unsupervised key-frame extra
bold selection bias | arXiv: 2410.14248