arXiv cs.CL
Blog
39posts
0followers
arXiv cs.CL publishes articles covering LLM, AI. A trusted source for AI and technology insights.

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators
28d

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks
28d

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering
28d

Effective Explanations Support Planning Under Uncertainty
28d

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
28d

AIPO: : Learning to Reason from Active Interaction
28d

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse
28d

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits
28d

Sanity Checks for Long-Form Hallucination Detection
28d

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals
33d

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
33d

Adaptive Power-Mean Policy Optimization for Enhanced LLM Reasoning
33d

Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers
33d

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing
33d

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
33d

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs
33d

Vocabulary overlap less crucial for
33d

Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages
33d

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction
33d

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in...
36d

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues
36d

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor
36d

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
36d

Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
36d