Behind my IAM security assistant built on Amazon Bedrock

The full step-by-step lives in the repository, and the project first appeared as an article on the AWS blog in Spanish. This post doesn’t repeat that walkthrough. It goes after the problem underneath (why least privilege matters, why it’s so hard to get right) and how the project changed.

Enterprise AI rarely stalls on the model. It stalls waiting for permissions.

Almost every conversation about it revolves around the model: which one is smarter, which is cheaper, which has the largest context window. After five years helping some of the largest companies in my region adopt the cloud (and lately generative AI), I can tell you the bottleneck is almost never how smart the model is. It’s something far less glamorous: the permissions it needs to do anything at all.

I saw it so many times that I ended up building a tool to attack it. This is the story of that problem, and of what I learned building it.

The problem I kept seeing

An application team needs AWS permissions for their workload to run. They ask the security team. Security reviews the IAM policy, finds it asks for more than necessary, and rejects it. The team adjusts, asks again, and the cycle repeats. Two weeks later, the project is still waiting on a permission.

Interaction flow between application owners and the security team to obtain AWS access.

Nobody does anything wrong in that story. The developer isn’t a least-privilege expert; the security engineer doesn’t know the application in detail. It’s a two-way knowledge gap, and it costs weeks. This isn’t a regulation problem or a culture problem. It’s a translation problem.

The question I asked myself: what if a tool did the first pass, so the request reaches security already close to least privilege?

Why this isn’t a minor detail

Broad permissions aren’t an administrative nuisance: they’re the front door for a large share of cloud incidents. In Tenable and the Cloud Security Alliance’s 2025 State of Cloud and AI Security report, the leading causes of breaches are identity failures: excessive permissions (31%), inconsistent access controls (27%), and weak identity hygiene (27%). The problem gets worse with non-human identities: Tenable’s Cloud and AI Security Risk Report 2026 found that 52% of organizations have non-human identities with critical excessive permissions (versus 37% for human ones), right as AI agents multiply that kind of identity.

The pattern shows up again and again in incidents:

Long-lived access keys that never get rotated, sometimes with admin permissions, sitting on a server or in an environment variable for years.
Secrets and credentials leaked in repositories (in git history, a config file, a notebook). A broadly-scoped key leaked this way is direct access.
Lateral movement from third-party apps: a role that over-trusts an external provider and becomes the springboard into the rest of the account.
Inherited roles and policies that accumulate “just in case” permissions across migrations, and that nobody trims for fear of breaking something.
Defaults left too open: the * wildcard that stayed from day one because “we’ll tighten it later,” and never got tightened.

The common denominator isn’t a genius attacker. It’s a permission that wasn’t needed. When an identity can only do what it actually needs, a leaked credential stops being a catastrophe and becomes a contained incident.

And still, it’s rarely a priority

What I find hardest to understand isn’t that least privilege is difficult. It’s that, knowing the risk, many companies don’t prioritize it until something blows up. Security gets seen as a cost that slows delivery, when it’s the opposite: a breach hits business goals directly (fines, lost trust, weeks of a team putting out fires instead of building). Protecting data doesn’t compete with the business; it sustains it.

This isn’t abstract for me. Between 2025 and 2026, as a user of their services, I found flaws in two different companies in a highly regulated sector that handled confidential personal records: some of the most sensitive data there is, exposed more than anyone would want. I reported them directly, and both fixed the flaws after the alert. Neither case involved bad intent: there was haste, technical debt, and the assumption that “someone must have already checked this.” That assumption is the problem: security rarely fails because someone chose to ignore it; it fails by omission, while everyone looks toward the next release.

Least privilege is one of the cheapest defenses against that omission. It won’t stop a credential from leaking, but it decides whether that leak is a scare or a headline.

Why it’s so hard to get right

And here’s the trap: even when everyone agrees least privilege is the right call, defining it well is hard, and it’s hard on any cloud. Assume the best case, that the application was designed to do exactly what it should, no more and no less. Even then, translating that into a correct permissions policy runs into several walls:

Granularity. Fine-grained access control is powerful, but that same power demands precision: there are a great many possible actions, and the one you need isn’t always named the way you’d expect. A single console operation can fire several API calls, each with its own permission. Knowing the exact set means reading docs and testing.
Resources and conditions. The action isn’t enough; you have to scope the resource and add conditions (a region, a tag, a prefix). That fine work is the first thing skipped under pressure, and the thing that separates a policy “that works” from one “of least privilege.”
Platforms evolve. New services and capabilities appear all the time. A policy that’s perfect today can be incomplete (or excessive) months later, without anyone touching it.
The bias toward broad. A too-narrow policy breaks the app and files a ticket; a too-broad one works silently. The daily incentive pushes toward asking for more, and that’s exactly what piles up as risk.

That’s why we ask people to hand-write least-privilege policies for platforms that evolve constantly. It doesn’t scale.

Where generative AI helps — and where it doesn’t

A language model turns out to be a genuinely good translator between two languages: “what the application needs to do,” in plain English, and “what that looks like as an IAM policy,” in JSON. That’s real value, and it’s the part of the problem where generative AI earns its place.

But the model on its own isn’t enough. Ask an LLM for an IAM policy and it will happily hand back something that looks right and is quietly, dangerously wrong (an action that doesn’t exist, a grant wider than you asked for) stated with complete confidence. The fix isn’t a cleverer prompt. It’s pairing the generative step with something the model can’t argue its way past: a deterministic validator anchored to how the platform actually works. Left to itself, a model always sounds certain; the validator is what makes that certainty worth trusting.

That’s the bet behind the tool I built.

What I built

A web self-service portal where you paste an IAM policy and get an analysis: syntax validation, a review of how well it meets least privilege, and a score from 1 to 10 with the points to improve. The idea: let the back-and-forth with security start from a better place, not from zero.

Demo of the self-service portal: the user pastes an IAM policy, receives the analysis with its compliance score, and refines the policy by chatting with the assistant.

It ended up published on the official AWS blog in Spanish and as part of aws-samples on GitHub. That part I didn’t plan: it started as a proof of concept to learn Bedrock and kept growing.

How the architecture changed

The version I described in 2023 and the one in the repository today share the intent and almost nothing else. They’re worth contrasting, because the change says a lot about how generative AI on AWS evolved.

In 2023 the solution was simple on purpose: CloudFront served a form from S3, API Gateway invoked a Lambda, and that Lambda called Bedrock with the Claude of the day. Deployment was a CloudFormation template, you had to enable model access by hand, and you had to package a Lambda layer with a recent boto3 so the SDK knew the Bedrock API. It did one thing: analyze a policy.

Original 2023 architecture: CloudFront distributes a form hosted on S3, which calls API Gateway, which invokes a Lambda function that queries Amazon Bedrock.

Today the project is more ambitious:

It generates as well as analyzes. You describe in plain language what your application needs and it returns a least-privilege policy. If the request is too broad (“full access to EC2”), it asks for more detail instead of handing back something unsafe.
It converses. After the first analysis you can ask it to restrict a resource, add a condition, or fix a specific finding, and it refines the policy with you.
It leans on a source of truth. The model drafts, but IAM Access Analyzer validates the policy against the platform’s real rules before returning it.
The engine changed, and the MCP server came with it. In 2023 a Lambda called Bedrock directly. Today the agents run on the Strands Agents SDK over Amazon Bedrock AgentCore, with Claude Haiku 4.5, and the Lambdas became thin adapters. That same redesign exposed the assistant as an MCP server, to use it from tools like Kiro or Claude Desktop. AgentCore fits this better than the old Lambda: it’s built for agents with long, isolated sessions instead of Lambda’s short request-response pattern and its 15-minute ceiling, billing pauses while the agent waits on the model (most of each interaction), and the agent’s logic lives in a managed runtime with traces and metrics to CloudWatch, which matter when the behavior is non-deterministic and you need to understand why it answered what it did.
It’s deployed with CDK instead of a hand-written CloudFormation template, and it gained the pieces a demo doesn’t need but something close to production does: WAF in front of CloudFront and API Gateway, an origin-verification secret, and an audit trail in DynamoDB.

The diagram today speaks for itself: where there used to be four boxes in a line, there are now agents, validation, edge protection, and auditing. More capability, and also more surface to understand and maintain.

Current architecture: CloudFront is the single entry point and serves the frontend and API calls; API Gateway invokes Lambda adapters that delegate to two Amazon Bedrock AgentCore runtimes; each agent uses Amazon Bedrock with Claude Haiku 4.5 and IAM Access Analyzer to validate policies; everything is audited in DynamoDB, and both CloudFront and API Gateway are protected by WAF.

What’s worth taking away

Beyond this particular tool, three ideas serve any project that touches permissions.

First: check the permission boundary before the model. If an AI initiative or a migration stalls, the bottleneck is almost never the model; it’s the translation gap between the people who build and the people who protect. That’s where the weeks go.

Second: the generator needs a verifier. This was the project’s most important lesson, and it’s why IAM Access Analyzer sits at the core of the tool: the model proposes, the validator checks against the platform’s real rules. The pattern isn’t AWS-specific; on any cloud the question that matters is what you can anchor the model to so it can’t hand you a confident, broken policy.

Third, practical: this is a demonstration, not a replacement for human judgment. The automated analysis and generation are a suggestion that shortens the path, not one that signs off for you. Before applying any policy, validate it with someone in security.

If you want to set it up, the AWS article and the repository have the detail. Least privilege isn’t the glamorous part of cloud security, but it decides how much it hurts when something goes wrong. And the real fix isn’t only technical: it’s getting the people who build and the people who protect to understand each other a little better. The model was never the hard part.

Opinions here are my own and do not represent my employer.