AI Sec Bench
Directory-style comparison grid of AI and LLM security tools across scanners, guardrails, and observability categories
AI Security

The AI Security Tools Directory: 40+ Tools Compared (2026)

A maintained 2026 directory of 40+ AI and LLM security tools, comparing scanners, runtime guardrails, injection detection, and observability.

By AISecBench Editorial · ·Updated June 20, 2026 · 7 min read

This is a maintained directory of the AI and LLM security tooling landscape as of 2026. It covers more than 40 tools across four working categories: red-team and vulnerability scanners, runtime guardrails and safety filters, prompt-injection and jailbreak detection, and LLM observability with security monitoring. Each entry is compiled from public project documentation, repositories, and vendor pages, and is tagged by type, license posture, and maturity so you can shortlist before you trial.

The intent is reference, not ranking. Tool fit depends on your threat model, your deployment surface, and whether you can self-host. Use the master table to scan the whole field, then read the per-category notes for the tradeoffs that do not fit in a table cell. This page is updated as projects ship, get acquired, or go dormant.

Master comparison table

ToolCategoryTypeOpen SourceWhat it doesMaturityLink
garakScanners and red-teamLLM vulnerability scanner (CLI)Yesnmap-style scanner that probes an LLM for prompt injection, jailbreaks, data and PII leakage, toxicity, and hallucinationActive (NVIDIA, Apache-2.0)repo
PyRITScanners and red-teamGenAI red-teaming automation frameworkYesAutomates adversarial red-teaming, adapting attack prompts across OpenAI, Azure, Anthropic, Google, HuggingFace, and custom endpointsActive (Microsoft, MIT)repo
PromptfooScanners and red-teamLLM eval and red-team CLI / libraryYesDeclarative eval and red-team tool scanning 50+ vulnerability classes with CI/CD integrationActive (now part of OpenAI, MIT)repo
GiskardScanners and red-teamAI/LLM testing and scan libraryYesLLM Scan auto-generates adversarial test suites for OWASP-LLM-Top-10 issues from a plain-language app descriptionActive (v3 beta)repo
DeepEvalScanners and red-teamLLM evaluation framework (Pytest-style)YesPytest-like framework that unit-tests LLM apps with metrics like G-Eval, faithfulness, and hallucinationActive (Confident AI, Apache-2.0)repo
DeepTeamScanners and red-teamLLM/agent red-teaming frameworkYesBuilt on DeepEval, dynamically generates adversarial attacks aligned to OWASP LLM Top-10 without a prepared datasetActive (Confident AI)repo
FuzzyAIInjection and jailbreak detectionAutomated LLM fuzzer / jailbreak testerYesMutates and escalates attack prompts using 18+ techniques (genetic, DAN, crescendo, PAIR, many-shot, ASCII smuggling)Active (CyberArk, Apache-2.0)repo
promptmap (promptmap2)Injection and jailbreak detectionPrompt-injection scanner for custom appsYesTests custom LLM apps for prompt injection via white-box or black-box HTTP modes across rule categoriesActive (rewritten 2025, GPL-3.0)repo
Agentic SecurityScanners and red-teamAgentic/LLM vulnerability scannerYesStress-tests LLM and agent workflows with jailbreaks, API fuzzing, and multimodal text/image/audio attacksActive (Apache-2.0)repo
LLMFuzzerInjection and jailbreak detectionFuzzing framework for LLM API integrationsYesFirst open-source fuzzing framework built for LLM API integrations, with connectors, proxy support, and HTML reportsUnmaintained (dormant since ~2023)repo
VigilInjection and jailbreak detectionLLM input/prompt security scannerYesLibrary and REST API scanning prompts and responses with vector similarity, heuristics, transformers, and canary tokensDormant / alpha (last release Dec 2023)repo
RebuffInjection and jailbreak detectionPrompt-injection detector / guardrailYesSelf-hardening injection detector combining heuristics, an LLM detector, a vector DB of attacks, and canary tokensArchived (Protect AI, prototype)repo
Meta Prompt Guard 2Injection and jailbreak detectionOpen-weights injection/jailbreak classifierYesmDeBERTa-based classifier (86M and 22M sizes) that labels a prompt benign or malicious to flag direct jailbreaks and injection attemptsActive (Meta, Llama Community License)model
ProtectAI deberta-v3 prompt-injectionInjection and jailbreak detectionOpen-weights prompt-injection classifierYesDeBERTa-v3-base model fine-tuned to classify English text as benign or injection, with v2 reporting roughly 95 percent accuracy on held-out dataActive (Protect AI, Apache-2.0)model
deepset injection classifierInjection and jailbreak detectionOpen-weights prompt-injection classifierYesDeBERTa-v3-base model fine-tuned on the deepset prompt-injections dataset to label text as legitimate or injectionActive (deepset, MIT)model
NVIDIA NemoGuard JailbreakDetectInjection and jailbreak detectionOpen-weights jailbreak-detection modelYesRandom-forest classifier over Snowflake Arctic embeddings that scores whether an input is a jailbreak attempt, wired into NeMo Guardrails input railsActive (NVIDIA Open Model License)model
LlamaFirewallInjection and jailbreak detectionAgent guardrail framework (injection focus)YesPolicy engine that orchestrates PromptGuard 2 for injection scanning and an AlignmentCheck module that audits agent reasoning for goal hijacking and indirect injectionActive (Meta, MIT framework)repo
Llama Guard (Purple Llama)Runtime guardrailsOpen-weights safety classifierYesClassifies prompts and responses against a hazard taxonomy; Llama Guard 4 is a 12B multimodal text and image modelActive (Meta)repo
NVIDIA NeMo GuardrailsRuntime guardrailsProgrammable guardrails toolkitYesAdds programmable input, dialog, retrieval, execution, and output rails via the Colang modeling languageActive (Apache-2.0)repo
Lakera GuardRuntime guardrailsCommercial AI security API (SaaS)NoReal-time API blocking prompt injection, jailbreaks, system-prompt extraction, and PII/secrets leakage; acquired by Check PointActivesite
LLM GuardRuntime guardrailsOpen-source LLM security toolkitYesInput and output scanners that detect, redact, and sanitize injection, PII, toxicity, and banned topics offlineActive (Protect AI, MIT)repo
Guardrails AIRuntime guardrailsValidation framework and hubYesWraps LLM calls with composable input/output Guards built from a Hub of validators (toxicity, PII, bias, more)Active (Apache-2.0)repo
OpenAI Moderation APIRuntime guardrailsHosted moderation/classification APINoFree hosted endpoint classifying text and image inputs across harm categories without generating a responseActivedocs
OpenAI GuardrailsRuntime guardrailsOpen-source guardrails libraryYesWraps the OpenAI client with configurable moderation, PII, jailbreak, hallucination, and URL checks plus a tripwireActive (MIT, Dec 2025)repo
Protect AI (Guardian / Recon / ModelScan)Runtime guardrailsCommercial AI security platformNoModel scanning, AI asset discovery, and red teaming; acquired by Palo Alto Networks, ModelScan stays open sourceActivesite
Azure AI Content Safety (Prompt Shields)Runtime guardrailsCloud content-moderation serviceNoFilters harmful content and, via Prompt Shields, blocks user and document-embedded (indirect) injection in real timeActivedocs
Amazon Bedrock GuardrailsRuntime guardrailsManaged cloud guardrails serviceNoApplies content filters, denied topics, word filters, PII redaction, and contextual-grounding checks to LLM I/OActivesite
Google ShieldGemmaRuntime guardrailsOpen-weights safety classifierYesGemma-based classifiers judging whether text (2B/9B/27B) or images (4B) violate safety policies across harm typesActivedocs
IBM Granite GuardianRuntime guardrailsSafety/hallucination detector modelYesGranite models detecting prompt and response risks plus RAG hallucination and relevance checksActive (Apache-2.0)model
Arize PhoenixObservabilityLLM observability and eval platformYesSelf-hostable OpenTelemetry/OpenInference platform for tracing LLM and agent calls and LLM-as-a-judge evalsActiverepo
LangfuseObservabilityLLM engineering / observability platformYesSelf-hostable tracing, evals, prompt management, and datasets; integrates with OTel, LangChain, and the OpenAI SDKActive (acquired by ClickHouse)repo
HeliconeObservabilityLLM observability platform and gatewayYesOne-line, self-hostable platform that monitors, evaluates, and routes requests across 100+ modelsActive (Apache-2.0)repo
LangSmithObservabilityCommercial LLM observability platformNoFramework-agnostic tracing, evaluation, and prompt management for LLM and agent runs in productionActivesite
TruLensObservabilityLLM evaluation and tracing libraryYesOpenTelemetry-based library using programmatic feedback functions to evaluate I/O quality and track experimentsActive (Snowflake, MIT)repo
OpenLLMetry (Traceloop)ObservabilityOpenTelemetry LLM instrumentation toolkitYesOTel extensions and SDK that auto-instrument LLM providers and vector DBs and export to any backendActive (Apache-2.0)repo
WhyLabs PlatformObservabilityCommercial AI/ML observability platformNoMonitors data quality, drift, and model health and guardrails LLMs using statistical profiles, not raw dataActivesite
whylogsObservabilityData-logging / profiling libraryYesSummarizes datasets into compact statistical profiles to monitor data quality and detect drift, including LLM dataActive (Apache-2.0)repo
LangKitObservabilityLLM monitoring / text-metrics toolkitYesBuilt on whylogs, extracts safety and quality signals (relevance, sentiment, jailbreak/PII) from prompts and responsesMaintenance (last release Nov 2024)repo
Fiddler AIObservabilityCommercial AI observability platformNoLLM and ML monitoring with trust-and-safety metrics and low-latency guardrails against hallucination and injectionActivesite
Datadog LLM ObservabilityObservabilityCommercial LLM observability productNoAdds LLM and agent tracing to Datadog APM with built-in evals, sensitive-data scanning, and cost monitoringActive (GA)site

Scanners and red-team frameworks

This is the most crowded and fastest-moving category, and consolidation is now visible at the top: NVIDIA backs garak, Microsoft backs PyRIT, and Promptfoo is part of OpenAI, yet all three remain open source. The practical split is between scanners that ship adversarial probe catalogs out of the box (garak, Giskard, Agentic Security) and frameworks that automate attack generation and orchestration (PyRIT, DeepTeam, Promptfoo). Eval-first tools like DeepEval blur the line by treating security findings as failing unit tests, which is why they pair naturally with their red-team siblings. For deeper methodology on running these, see our notes on how to test AI agent security and the field guide to the best LLM red-teaming tools for 2026.

Runtime guardrails and safety filters

Guardrails sit in the request path and enforce policy on input, output, or both, and the category splits cleanly into hosted services and self-hostable models or libraries. Cloud-native options (Lakera Guard, Azure Prompt Shields, Amazon Bedrock Guardrails, OpenAI Moderation) trade control for low operational overhead, while open-weights classifiers (Llama Guard, ShieldGemma, Granite Guardian) and toolkits (NeMo Guardrails, LLM Guard, Guardrails AI) let you keep data in your own boundary. The acquisition trend is unmistakable here, with Lakera moving to Check Point and Protect AI folded into Palo Alto Networks, so factor vendor stability into any procurement that is not self-hosted. For a deeper head-to-head, see our best AI guardrail tools review.

Injection and jailbreak detection

This sub-category is where the offensive and defensive sides meet: fuzzers and injection scanners (FuzzyAI, promptmap, LLMFuzzer, Vigil) find the holes, and detectors (Rebuff, and the injection-specific paths in the guardrail tools) try to close them. It is also where tool mortality is highest, with LLMFuzzer dormant, Vigil in long-dormant alpha, and Rebuff archived, so check the last-release date before you build a pipeline around any single project. Active maintenance now concentrates in vendor-backed efforts like CyberArk’s FuzzyAI and the rewritten promptmap2. The detection side has shifted toward small open-weights classifiers you can self-host: Meta Prompt Guard 2 (an mDeBERTa model in 86M and 22M sizes) labels prompts as benign or malicious, Protect AI’s deberta-v3-base-prompt-injection-v2 and the deepset deberta-v3-base-injection model both fine-tune DeBERTa-v3 to flag injection text, and NVIDIA’s NemoGuard JailbreakDetect scores jailbreak attempts and plugs into NeMo Guardrails input rails. For agent-stage defense, Meta’s LlamaFirewall pairs Prompt Guard 2 with an AlignmentCheck module that audits an agent’s chain of thought for goal hijacking and indirect injection, while the commercial Lakera Guard API (now part of Check Point) covers the same ground as a hosted service. These classifiers are narrow by design, with the DeBERTa-based ones limited to specific languages and prone to false positives on system prompts, so they belong behind a fuzzer and alongside, not in place of, the broader guardrail layer. For benchmarking how well these detectors actually hold up, see our work on benchmarking prompt-injection detectors and benchmarking jailbreak resistance with ASR.

LLM observability and security monitoring

Observability is the layer most teams under-invest in, yet it is where you detect abuse, drift, and silent guardrail failures after deployment. The open-source core has matured around OpenTelemetry and OpenInference, with Arize Phoenix, Langfuse, Helicone, TruLens, and OpenLLMetry all self-hostable and trace-first, while commercial platforms (LangSmith, Datadog, Fiddler, WhyLabs) add managed evals, sensitive-data scanning, and enterprise support. Several of these now bundle security signals directly into traces (PII leakage, prompt-injection flags, hallucination scores), which makes the line between observability and guardrails increasingly blurry. For how we measure whether these evaluation signals are trustworthy, see our note on comparing safety benchmarks: HarmBench and JailbreakBench.

Methodology and last updated

This directory is editorially compiled from public sources: project repositories, official documentation, vendor product pages, and license files. Entries are categorized by primary function, with type, open-source status, and maturity recorded as observed at compile time. Maturity labels (active, maintenance, dormant, archived, unmaintained) reflect public release cadence and repository or vendor signals, not a private benchmark. We do not rank tools here and we take no vendor compensation for inclusion. Tools move fast in this space: projects are acquired, renamed, archived, or revived, so verify the current state at each linked source before relying on a label. Last updated June 2026.

Sources

  1. NVIDIA garak LLM vulnerability scanner
  2. Microsoft PyRIT Python Risk Identification Tool for generative AI
  3. OWASP Top 10 for Large Language Model Applications
  4. Meta Purple Llama / Llama Guard
  5. NVIDIA NeMo Guardrails
Subscribe

AI Sec Bench — in your inbox

Benchmarks and evaluations of AI security tools. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments