Benchmarking Failures in Tool-Augmented Language Models
Eduardo Treviño*, Hugo Contant*, James Ngai, Graham Neubig, Zhiruo Wang (2025). Benchmarking Failures in Tool-Augmented Language Models." NAACL.
Machine Learning @ National Security Agency 2024
Machine Learning @ Carnegie Mellon University 2023
Software Engineering Intern @ Allstate 2018–2022
Eduardo Treviño*, Hugo Contant*, James Ngai, Graham Neubig, Zhiruo Wang (2025). Benchmarking Failures in Tool-Augmented Language Models." NAACL.