Building SecureVibes: A Multi-Agent Security System (Part 2/3)

securevibes-architecture

This is Part 2 of a 3-part series on building SecureVibes, a multi-agent security system for vibecoded applications.

Series Navigation: Part 1 | Part 2 | Part 3


From Hypothesis to Implementation

In Part 1, I outlined the hypothesis: multi-agent systems could outperform single-agent scanners by mimicking how human security teams work—understanding architecture first, then modeling threats, then validating them in code.

I've been pondering what an AI-native security scanner/system could look like to secure vibecoded apps and whether I could build it with an agentic framework like the Claude Agent SDK. Generally speaking, building a code security scanner isn't trivial. And, if you add AI on top of that, you have to know what you're doing otherwise it's just a time sink of a project, with no real ROI. There has to be some method to this madness.

I realized that I could programmatically invoke Claude Code and build my own workflow with multiple subagents with the orchestration completely abstracted by the SDK.

SecureVibes consists of four specialized agents working in sequence. Here's how each one works, what I learned building them, and why certain design decisions matter.


The Multi-Agent Architecture Overview

The architecture follows a simple pipeline:

Codebase → Agent 1 (SECURITY.md) → Agent 2 (THREAT_MODEL.json) → Agent 3 (VULNERABILITIES.json) → Agent 4 (scan_results.json)

Each agent produces a file artifact that becomes input for the next stage. This file-based communication pattern (more on this later) proved to be one of the best design decisions.

Claude SDK Orchestration

Unlike traditional multi-agent systems that require custom orchestration code, SecureVibes leverages the Claude Agent SDK's built-in orchestration. Claude itself coordinates the agents:

  1. Receives the high-level goal: "Scan this repo for vulnerabilities"
  2. Intelligently decides to run Phase 1 → generates SECURITY.md
  3. Passes SECURITY.md to Phase 2 → generates THREAT_MODEL.json
  4. Passes both artifacts to Phase 3 → generates VULNERABILITIES.json
  5. Runs Phase 4 → generates final scan_results.json
  6. Tracks costs and timing across all agents

Why this matters: The SDK handles agent coordination, tool access control, file management, and error recovery. This means less code to maintain and more reliable execution. The Scanner class simply provides the high-level prompt and agent definitions—Claude figures out the rest.


Phase 1: Assessment Agent (The Architect)

The Assessment Agent acts as a software architect, analyzing your codebase to create comprehensive security documentation. It explores your code using Read, Grep, Glob, and LS tools, and generates a structured SECURITY.md document.

What it documents:

  • Overall architecture and component structure
  • Data flow between components
  • Authentication and authorization mechanisms
  • External dependencies and APIs
  • Sensitive data paths (credentials, PII, etc.)
  • Entry points (APIs, forms, CLI commands)
  • Technology stack and frameworks
  • Existing security controls

Think of this as the reconnaissance phase. The agent is learning: "What does this application do? How is it built? What does it handle?"

Here's an example of what the output looks like:

## Authentication Mechanism - JWT tokens stored in localStorage - Refresh tokens in httpOnly cookies - Token validation in middleware/auth.py:45-67 - Session management using Redis store

Key design decision: I gave this agent only read-only tools. It can explore but not modify. This ensures it focuses purely on understanding, not changing anything.

Prompt engineering insight: I gave it an exact template. First attempts produced walls of unstructured text. The template ensures consistency—every SECURITY.md follows the same structure, making it reliable input for Phase 2.


Phase 2: Threat Modeling Agent (The Strategist)

The Threat Modeling Agent takes the SECURITY.md from Phase 1 and performs STRIDE-based threat analysis.

STRIDE stands for:

  • Spoofing - Identity verification issues
  • Tampering - Data integrity issues
  • Repudiation - Audit and logging issues
  • Information Disclosure - Confidentiality issues
  • Denial of Service - Availability issues
  • Elevation of Privilege - Authorization issues

For each identified threat, it generates:

  • Specific threat title and description
  • STRIDE category
  • Severity level (critical, high, medium, low)
  • Affected components
  • Attack scenario
  • Potential vulnerability types (with CWE IDs)
  • Mitigation strategies

The output is a structured THREAT_MODEL.json with all identified threats.

Here's what the output looks like:

{ "threat_id": "T-001", "title": "SQL Injection in User Login", "stride_category": "Tampering", "severity": "critical", "affected_components": ["auth.py", "/api/v1/login"], "attack_scenario": "Attacker crafts malicious SQL in username field...", "cwe_id": "CWE-89", "mitigation": "Use parameterized queries or ORM" }

Key design decision: I constrained the agent to output structured JSON rather than free-form text. This makes the output machine-readable and eliminates parsing ambiguity. File-based communication between agents is way more reliable than trying to parse natural language.

Why STRIDE still matters: It forces comprehensive coverage. Without it, the agent fixates on injection attacks and ignores authorization issues. STRIDE gives the Threat Modeling Agent a structured way to think about threats, ensuring coverage across all categories.


Phase 3: Code Review Agent (The Validator)

The Code Review Agent takes both SECURITY.md (context) and THREAT_MODEL.json (threats to validate) and searches the actual codebase to confirm which threats are real vulnerabilities.

For each confirmed vulnerability, it provides:

  • Exact file path and line number
  • Code snippet showing the vulnerability
  • Detailed explanation of how it's exploitable
  • CWE ID
  • Severity level
  • Specific remediation recommendation
  • Evidence of exploitability

The output is VULNERABILITIES.json with only confirmed, validated vulnerabilities—no theoretical risks.

Here's an example:

{ "threat_id": "VULN-001", "title": "SQL Injection in User Authentication", "description": "The user_id from request.args is concatenated directly into SQL query without sanitization. An attacker can inject SQL commands to bypass authentication or extract database contents.", "severity": "critical", "file_path": "api/auth.py", "line_number": 157, "code_snippet": "query = f\"SELECT * FROM users WHERE id = {user_id}\"", "cwe_id": "CWE-89", "recommendation": "Use parameterized queries: cursor.execute(\"SELECT * FROM users WHERE id = ?\", (user_id,))", "evidence": "The variable user_id is read from request.args at line 155 without any validation. It's then directly interpolated into the SQL string at line 157. Testing with user_id='1 OR 1=1--' would bypass the WHERE clause. Exploitability: HIGH." }

NOTE: These vulnerabilities still need to be confirmed as exploitable dynamically. Since we don't have that feature/agent (yet!), this agent will only be able to confirm by statically analyzing the codebase. As SecureVibes continues to evolve over time, my hope is that we will be able to combine both static and dynamic analysis at some point to be 100% sure. Stay tuned!

Key design decision: The prompt explicitly instructs the agent to distinguish between real vulnerabilities and false positives. It must provide concrete evidence, not just flag suspicious patterns. This dramatically reduces false positive rates.

This is where the multi-agent approach shines. Phase 3 isn't guessing—it's validating specific hypotheses from Phase 2 with architectural context from Phase 1. The agent knows what to look for, where to look, and how to interpret what it finds.


Phase 4: Report Generator (The Compiler)

The Report Generator Agent takes the raw vulnerability data from Phase 3 and compiles it into a final, structured report.

What it does:

  • Reads VULNERABILITIES.json
  • Standardizes the format across all findings
  • Adds metadata (scan time, file count, costs)
  • Generates scan_results.json with consistent schema
  • Calculates severity distribution stats

Key design decision: This separate formatting step ensures the final output is always consistent, even if Phase 3's output varies slightly. It also makes it easier to add new output formats (Markdown, HTML, SARIF, etc.) in the future.


Building It: The Journey

Initial Approach (What Didn't Work)

I've been playing with terminal based coding agents like Claude Code and Codex for a while now. When I first tried to build a security scanner, I took the obvious approach: give Claude access to the entire codebase and ask it to find vulnerabilities.

The results were... underwhelming.

The agent would either:

  1. Get overwhelmed by the context and produce generic findings
  2. Focus too narrowly on one file and miss the bigger picture
  3. Report patterns that looked suspicious but weren't actually vulnerable

I realized the problem: I was asking one agent to be an architect, a threat modeler, AND a security auditor simultaneously.

Human security teams don't work that way. Why should AI?

When a security professional reviews an application, they don't just grep for "SQL injection". They follow a structured process. And this realization led me to a different approach.

The Multi-Agent Breakthrough

I've been following the Claude Agent SDK developments pretty closely. They recently revamped the SDK. So, I decided to give it a try.

The breakthrough came when I structured it as a pipeline:

  1. Assessment creates context
  2. Threat Modeling uses that context to hypothesize threats
  3. Code Review validates those specific threats in code

Each agent has a narrow, focused task. And each agent's output becomes the next agent's input. This creates a progressive refinement of analysis.

The results were dramatically better. False positives dropped significantly because Phase 3 is validating specific threats, not randomly pattern-matching. The findings were more detailed because each agent had the right context for its specific task.

My first multi-agent version tried to pass data in-memory between agents. Debugging was a nightmare. I couldn't see what Phase 1 actually sent to Phase 2. When something went wrong in Phase 3, I had no visibility into whether the issue was bad input from Phase 2 or a problem with Phase 3's logic.

Switching to file-based communication (.md and .json files) made the system so much easier to understand, debug, and extend. I can inspect any phase's output, replay phases, and even manually edit artifacts to test edge cases.

Prompt Engineering Hell (and Heaven)

Getting the prompts right was the hardest part. Here's what I learned:

For the Assessment Agent:

  • Initially, it would produce walls of text with no structure
  • Solution: Provide an exact template in the prompt with section headers
  • Result: Consistent, well-structured SECURITY.md every time

For the Threat Modeling Agent:

  • First attempts produced generic threats like "SQL injection might exist"
  • Solution: Explicitly instruct it to be specific based on the actual architecture
  • Added: "Focus on SPECIFIC threats based on the ACTUAL architecture, not generic security advice"
  • Result: Specific threats like "SQL injection in user login endpoint at /api/v1/login due to string concatenation in auth.py line 157"

For the Code Review Agent:

  • It would sometimes return line ranges instead of exact line numbers
  • Solution: Prompt emphasized "Provide actual line numbers, not ranges" and "Include actual vulnerable code, not pseudocode"
  • Result: Precise findings with exact locations

The system prompt is where you define the agent's expertise. The user prompt is where you give it the task and constraints. Getting both right is critical.

At some point down the line, I wish to get rid of prompt engineering completely by adapting to something like DSPy.


What's Next?

The architecture is elegant. The prompts are refined. The agents work in harmony. But here's the real question...

Does this multi-agent approach actually find more vulnerabilities than single-agent systems? Than traditional SAST tools? Than using Claude Code directly?

I ran SecureVibes on its own codebase to find out. The results surprised me.

In Part 3, I share the full comparative analysis (SecureVibes vs Semgrep, Bandit, Claude Code, Codex, and more), model comparison (Haiku vs Sonnet vs Opus), key learnings, what's next for SecureVibes, and how you can contribute.

Continue to Part 3: Results & What's Next


Series Navigation: Part 1 | Part 2 | Part 3

Follow along:


If you like the content and don't want to miss out on new posts, enter your email and hit the Subscribe button below. I promise I won't spam. Only premium content!