santosh g's picture

1

santosh g

san300

AI & ML interests

None yet

Organizations

None yet

commented 2 papers over 1 year ago

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Paper • 2402.04249 • Published Feb 6, 2024 • 6 •

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Paper • 2402.04249 • Published Feb 6, 2024 • 6 •