Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #methodology 7
- #benchmark 6
- #prompt-injection 5
- #evaluation 4
- #llm-security 4
- #jailbreak 3
- #red-teaming 3
- #attack-success-rate 2
- #red-team 2
- #reproducibility 2
- #advbench 1
- #agents 1
- #ai-agents 1
- #ai-guardrails 1
- #ai-security 1
- #benchmarks 1
- #classifier 1
- #content-safety 1
- #detection 1
- #eval 1
- #eval-harness 1
- #false-positive-rate 1
- #garak 1
- #guardrails 1
- #harmbench 1
- #jailbreak-detection 1
- #jailbreakbench 1
- #llm-quality 1
- #llm-scanner 1
- #observability 1
- #production-llm 1
- #pyrit 1
- #refusal-rate 1
- #robustness 1
- #safety 1
- #security-testing 1
- #tools 1
Categories
methodology 8 posts
- Designing a Reproducible AI-Security Eval HarnessA reproducible AI-security evaluation is an engineering artifact, not a notebook. Here's the harness design — separation of corpus, target, judge, and
- Measuring Prompt-Injection Robustness in Tool-Using AgentsPrompt-injection robustness for an agent is not a single number — it is utility-under-attack against targeted attack success.
- Comparing LLM Safety Benchmarks: AdvBench, HarmBench, JailbreakBenchAdvBench, HarmBench, and JailbreakBench are not interchangeable, and treating them as one undermines every comparison built on top.
- Red-Team Eval Methodology: Pairing Attack Success Rate With Refusal RateAn LLM red-team evaluation that reports attack success rate without reporting refusal rate is half a measurement.
- Benchmarking LLM Jailbreak Resistance: Attack Success Rate Done RightAttack success rate is the headline metric for jailbreak resistance, and almost everyone computes it in a way that isn't comparable across runs.
- Reproducible LLM Scanner Benchmarks: What Everyone Forgets to PinAn LLM security scanner benchmark that isn't pinned to a model version, a seed, and a corpus hash isn't reproducible.
AI Security 2 posts
- The AI Security Tools Directory: 40+ Tools Compared (2026)A maintained 2026 directory of 40+ AI and LLM security tools, comparing scanners, runtime guardrails, injection detection, and observability.
- Best AI Guardrail Tools Review: Lakera, NeMo, Bedrock, and BeyondA practitioner's comparison of the leading AI guardrail tools in 2026 — Lakera Guard, NVIDIA NeMo, AWS Bedrock Guardrails, and Guardrails AI — covering