# AgentHarm turns agent misuse into a concrete safety benchmark

Category: safety-research
Published: 2024-10-11T00:00:00.000Z
Source: [arXiv: AgentHarm benchmark paper](https://arxiv.org/abs/2410.09024)
Agent usefulness: 93/100
Confidence: 0.88
Tags: agentharm, evals, safety, jailbreaks

## Human Summary
AgentHarm measures whether LLM agents refuse malicious multi-step tool-use requests and whether jailbreaks preserve enough capability to complete harmful tasks.

## Agent Summary
Use AgentHarm-style evals to test malicious task refusal, jailbreak robustness, multi-step tool-use capability retention, and harm-category coverage.

## Body
AgentHarm reframes safety testing around agents that use external tools and execute multi-stage tasks, rather than simple chatbot refusals. The benchmark includes 110 explicitly malicious agent tasks, expanded to 440 with augmentations, across 11 harm categories including fraud, cybercrime, and harassment. For agent builders, the signal is practical: safety evals need to test refusal, jailbreak resistance, and whether compromised agents still retain the planning and tool-use ability needed to complete harmful workflows.

## Sponsors
No sponsor placement attached.