The Vibecoding Security Crisis: Why Current Scanners Fail (Part 1/3)

securevibes-phases

This is Part 1 of a 3-part series on building SecureVibes, a multi-agent security system for vibecoded applications.

Series Navigation: Part 1 | Part 2 | Part 3


A few months ago, a friend of mine reached out to me and asked if I could help assess his vibecoded application from a security perspective. He used Replit to build it and wasn't sure about its security posture. He was about to onboard real customers/users on his app but before he did that, he wanted a sanity check on the security of his application's codebase from a security professional.

This conversation made me realize something fundamental about security in the vibecoding era: we're building applications faster than ever, but our security tools haven't caught up.


The Vibecoding Security Gap

Before diving into the solution, it's worth considering who actually needs this. The target user base for a tool like this isn't enterprise security teams with six-figure SAST licenses.

It's folks who are not super technical, but realize the power of vibecoding and want to build cool stuff with AI. They don't know (or don't care much about) security, but they do care about their users' data.

These folks are not security experts, nor do they have budgets like enterprise security teams. They want a quick holistic assessment of their vibecoded applications before they release it to the world. They also don't want to deal with a lot of noise from traditional code security scanners that don't really provide value, but add more work.

Platforms like Replit, Bolt, Lovable and v0 are all the rage these days. But, the focus of these platforms is not necessarily on security so the code generated by these platforms must be reviewed from a security perspective. These platforms are still evolving as we speak so the security of these vibecoded applications are highly questionable. If you don't believe me, check out these articles!

Now, vibecoders could technically use AI powered code editors like Cursor or terminal based coding agents like Claude Code to scan their codebase for security vulnerabilities by prompting the same way they did to build them in the first place, but this often requires them to know what to prompt, what context to provide, etc. Also, based on my interaction with a few of these vibecoders who are not technical, they don't feel comfortable operating with some of these more developer focussed tools. They'd rather prefer a low code no code platform or a drag and drop UI interface, where the app building is as simple as prompting and seeing their vision come to reality.

Their goal is to build something valuable, assess the codebase holistically, get a good understanding of practical risks that could manifest into high impactful vulnerabilities and address the ones they think are absolutely critical before the launch of their product.

The gap is clear: vibecoded apps need security scanning, but existing tools aren't built for them.


The Traditional SAST Problem

Traditional Static Application Security Testing (SAST) tools have been around for decades. And, they all share the same fundamental limitations:

  1. Pattern-based detection - They look for specific code patterns and report them as vulnerabilities. High false positive rates are common.
  2. No context awareness - They don't understand your application's architecture or how components interact.
  3. Single-pass analysis - One scan, one report. No iterative refinement.
  4. Generic reports - Findings often lack the nuanced context needed to prioritize or remediate effectively.
  5. Language specific - They need different rules for the same vulnerability class in different programming languages. For example, a SAST rule to detect SQLi might look different for Python and Ruby. Managing and maintaining these rules over time is not trivial.

Here's a concrete example: A traditional SAST tool sees SELECT * FROM users WHERE id = ${userId} and flags it as a SQL injection vulnerability. But it doesn't understand that userId is already validated by Zod three functions upstream. The context is lost, leading to false positives that waste developer time.

With LLMs, we have an opportunity to do better. But, simply throwing ChatGPT or Claude at your codebase isn't the answer either.


The Multi-Agent Hypothesis

So here's the question I started with: What if we mimicked how human security teams actually work?

When a security team reviews an application for security vulnerabilities, they don't just grep for "SQL injection". They follow a structured, three-phase approach:

  1. First, understand the architecture and data flows - What does this application do? How is it built? Where does data come from and where does it go?

  2. Then model potential threats based on that architecture - Given what we know about this system, what could go wrong? Where are the trust boundaries? What are the attack surfaces?

  3. Finally validate which threats manifest as real vulnerabilities in the code - Of all the potential threats we identified, which ones actually exist in the codebase? Can we prove they're exploitable?

This isn't just theory. I've seen security teams do exactly this during application assessments. It's a proven methodology. The question became: Could we encode this workflow into multiple specialized AI agents?

Each agent would have a narrow focus:

  • One agent to understand the architecture
  • One agent to hypothesize threats
  • One agent to validate those threats in code
  • One agent to compile the results

This three-phase approach is exactly what traditional SAST tools miss, and what single-agent AI systems struggle with due to context overload.


Building SecureVibes

So I built SecureVibes to test this hypothesis.

Four specialized agents, each with a narrow focus, working in sequence:

  1. The Assessment Agent understands the architecture
  2. The Threat Modeling Agent hypothesizes risks
  3. The Code Review Agent validates them
  4. The Report Generator compiles it all

Each agent's output becomes the next agent's input, creating a progressive refinement of analysis. The Assessment Agent creates context. The Threat Modeling Agent uses that context to hypothesize threats. The Code Review Agent validates those specific threats in code.

In Part 2, I walk you through the architecture in detail, the design decisions for each agent, and the prompt engineering hell (and heaven) that made it work.

But here's the real question: does a multi-agent approach actually find more vulnerabilities?

I ran SecureVibes on its own codebase to find out. The results surprised me. More on that in Part 3.

Continue to Part 2: Building the Multi-Agent Architecture


Series Navigation: Part 1 | Part 2 | Part 3

Follow along:


If you like the content and don't want to miss out on new posts, enter your email and hit the Subscribe button below. I promise I won't spam. Only premium content!