AiA Feed
R
Roni Itkin, Noam Issachar, Yehonatan Keypur, Yehonatan Keypur, Anpei Chen, Sagie Benaim

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

DR^3-Eval is a new benchmark for evaluating deep research agents on complex multi-step tasks using a static corpus that simulates web complexity while remaining reproducible. It introduces a five-dimensional evaluation framework (Information Recall, Factual Accuracy, Citation Coverage, Instruction Following, Depth Quality) and reveals critical failure modes in retrieval robustness and hallucination control.See more

4dRoni Itkin, Noam Issachar, Yehonatan Keypur, Yehonatan Keypur, Anpei Chen, Sagie Benaim
AiA Feed · Generated with AI, which can make mistakes.