arXiv cs.CL
arXiv cs.CL

arXiv cs.CL

Blog

39posts
0followers

arXiv cs.CL publishes articles covering LLM, AI. A trusted source for AI and technology insights.

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators

28d

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

28d

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

28d

Effective Explanations Support Planning Under Uncertainty

Effective Explanations Support Planning Under Uncertainty

28d

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

28d

AIPO: : Learning to Reason from Active Interaction

AIPO: : Learning to Reason from Active Interaction

28d

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse

28d

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

28d

Sanity Checks for Long-Form Hallucination Detection

Sanity Checks for Long-Form Hallucination Detection

28d

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

33d

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

33d

Adaptive Power-Mean Policy Optimization for Enhanced LLM Reasoning

Adaptive Power-Mean Policy Optimization for Enhanced LLM Reasoning

33d

Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers

Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers

33d

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

33d

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

33d

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs

33d

Vocabulary overlap less crucial for

Vocabulary overlap less crucial for

33d

Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages

Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages

33d

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

33d

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in...

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in...

36d

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

36d

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

36d

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

36d

Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

36d