ยท AI & Engineering  ยท 3 min read

Building LLM-Powered Infrastructure Discovery at AWS Scale

How we designed and deployed an AI agent using AWS Bedrock to automate EC2 instance selection, achieving 95%+ recommendation accuracy across millions of instance types.

How we designed and deployed an AI agent using AWS Bedrock to automate EC2 instance selection, achieving 95%+ recommendation accuracy across millions of instance types.

When AWS launched hundreds of new EC2 instance types over a decade, the question of which instance to use became genuinely hard. This complexity did not just impact customers; it broke the very systems trying to help them. This is the story of how we built an LLM-powered discovery agent to solve that problem.

The Problem: Too Much Choice

The AWS EC2 catalogue is enormous, encompassing thousands of instance type, region, and pricing combinations. A customer migrating from on-premise hardware to the cloud faces a massive combinatorial explosion of choices. Existing tooling like AWS Compute Optimizer or static instance type comparison tables certainly help, but they require users to already understand the exact dimensions they are optimizing for.

We needed a system that could ingest a natural-language description, such as a request for an architecture handling 10K concurrent WebSocket connections with low latency and predictable performance, and return a ranked, reasoned recommendation.

Architecture: Retrieval-Augmented Generation on Instance Metadata

The core engineering insight was treating EC2 instance selection as a retrieval problem first and a generation problem second.

Phase 1: Structured Knowledge Base

We built a pipeline focused on three main stages. First, we ingested all raw EC2 instance metadata, including vCPUs, memory, network bandwidth, EBS throughput, GPU specs, and pricing, directly into a vector store. Second, we enriched each entry with human-readable characteristics generated from structured data, explicitly noting when an instance is a memory-optimized option with a specific memory-to-vCPU ratio suited for in-memory databases. Third, we tagged instances with workload archetypes derived from official AWS documentation and historical customer usage patterns.

Phase 2: Intent Classification

Before hitting the retrieval layer, we classified the incoming query into workload dimensions. We evaluated whether the workload was compute-bound, memory-bound, or IO-bound, and weighed latency-sensitivity against throughput optimization. We also determined whether cost or performance took priority, and analyzed whether the load was burstable or sustained. This classification, which is an LLM task itself, dramatically improved retrieval precision.

Phase 3: Generation with Constraint Satisfaction

The final step used Claude via Bedrock with a highly structured prompt. The prompt included the top-k retrieved instance candidates alongside their enriched metadata, the classified workload intent, hard constraints like region availability and pricing limits, and a strict chain-of-thought reasoning requirement. The model returned ranked recommendations with explicit reasoning, giving customers a transparent logic trail they could actually audit and argue with.

Results

After a three-month production trial across internal AWS tooling, the architecture delivered definitive improvements. The pipeline achieved over 95% accuracy on known-answer benchmark workloads and drove a 40% reduction in support tickets related to instance selection. Furthermore, the P50 response time sat at 1.2 seconds, making it fast enough for real-time interactive use.

Lessons Learned

First, do not skip the retrieval layer. Pure generation by prompting the LLM with all instance metadata failed due to context window limits and hallucinations, where the LLM would confidently describe instance specifications that did not exist.

Second, classification before retrieval compounds gains. Each layer of precision improvement multiplies the efficiency of the next, and getting the workload intent right first was well worth the slight latency overhead.

Third, chain-of-thought is non-negotiable for trust. Customers did not just want raw recommendations; they wanted to understand the underlying mechanics of the decision. An answer without reasoning was treated with the same skepticism as a magic eight-ball.

Finally, structured output parsing beats free-form text. We eventually moved to JSON-mode outputs with a strictly defined schema, which made downstream processing and UI rendering far more reliable.


This pattern of classifying, retrieving, generating, and reasoning has become a standard approach in our internal AI tooling. I will write more about applying this exact framework to other complex AWS infrastructure problems in future posts.

Back to Blog
Using AI to Accelerate Intelligence

Using AI to Accelerate Intelligence

AI is a transformative technology, let's use it to improve ourselves. AI combined with Knowledge tracing will unlock the next evolution in education.

Taming AI Agents By Making Them Queue

Taming AI Agents By Making Them Queue

Direct API calls work fine for agent prototypes and one-off automations. When agents become core to how your system operates โ€” needing to be observable, comparable, and replaceable without cascading changes โ€” a different integration model is needed. Here's the architectural contract that makes it work.

Why Your CI/CD Pipeline Is Lying to You

Why Your CI/CD Pipeline Is Lying to You

A fast green build doesn't mean your software is production-ready. Here's what I learned building deployment systems at AWS scale, and how to build pipelines that actually tell the truth.