Tag
#benchmark
6 posts tagged benchmark.
- methodology
Measuring Prompt-Injection Robustness in Tool-Using Agents
Prompt-injection robustness for an agent is not a single number — it is utility-under-attack against targeted attack success.
- methodology
Comparing LLM Safety Benchmarks: AdvBench, HarmBench, JailbreakBench
AdvBench, HarmBench, and JailbreakBench are not interchangeable, and treating them as one undermines every comparison built on top.
- methodology
Benchmarking LLM Jailbreak Resistance: Attack Success Rate Done Right
Attack success rate is the headline metric for jailbreak resistance, and almost everyone computes it in a way that isn't comparable across runs.
- methodology
Reproducible LLM Scanner Benchmarks: What Everyone Forgets to Pin
An LLM security scanner benchmark that isn't pinned to a model version, a seed, and a corpus hash isn't reproducible.
- methodology
Benchmarking Jailbreak Classifiers: The Asymmetry Nobody Reports
Jailbreak classifiers are graded on attack recall and almost never on the cost of being wrong. That asymmetry is the whole story. Here's how to measure it.
- methodology
How to Benchmark a Prompt-Injection Detector Honestly
Most prompt-injection detector benchmarks are broken before the first request. Here is a test design that produces a number you can actually trust.