ยท AI & Engineering ยท 3 min read
Building LLM-Powered Infrastructure Discovery at AWS Scale
How we designed and deployed an AI agent using AWS Bedrock to automate EC2 instance selection, achieving 95%+ recommendation accuracy across millions of instance types.
When AWS launched hundreds of new EC2 instance types over a decade, the question of which instance to use became genuinely hard. This complexity did not just impact customers; it broke the very systems trying to help them. This is the story of how we built an LLM-powered discovery agent to solve that problem.
The Problem: Too Much Choice
The AWS EC2 catalogue is enormous, encompassing thousands of instance type, region, and pricing combinations. A customer migrating from on-premise hardware to the cloud faces a massive combinatorial explosion of choices. Existing tooling like AWS Compute Optimizer or static instance type comparison tables certainly help, but they require users to already understand the exact dimensions they are optimizing for.
We needed a system that could ingest a natural-language description, such as a request for an architecture handling 10K concurrent WebSocket connections with low latency and predictable performance, and return a ranked, reasoned recommendation.
Architecture: Retrieval-Augmented Generation on Instance Metadata
The core engineering insight was treating EC2 instance selection as a retrieval problem first and a generation problem second.
Phase 1: Structured Knowledge Base
We built a pipeline focused on three main stages. First, we ingested all raw EC2 instance metadata, including vCPUs, memory, network bandwidth, EBS throughput, GPU specs, and pricing, directly into a vector store. Second, we enriched each entry with human-readable characteristics generated from structured data, explicitly noting when an instance is a memory-optimized option with a specific memory-to-vCPU ratio suited for in-memory databases. Third, we tagged instances with workload archetypes derived from official AWS documentation and historical customer usage patterns.
Phase 2: Intent Classification
Before hitting the retrieval layer, we classified the incoming query into workload dimensions. We evaluated whether the workload was compute-bound, memory-bound, or IO-bound, and weighed latency-sensitivity against throughput optimization. We also determined whether cost or performance took priority, and analyzed whether the load was burstable or sustained. This classification, which is an LLM task itself, dramatically improved retrieval precision.
Phase 3: Generation with Constraint Satisfaction
The final step used Claude via Bedrock with a highly structured prompt. The prompt included the top-k retrieved instance candidates alongside their enriched metadata, the classified workload intent, hard constraints like region availability and pricing limits, and a strict chain-of-thought reasoning requirement. The model returned ranked recommendations with explicit reasoning, giving customers a transparent logic trail they could actually audit and argue with.
Results
After a three-month production trial across internal AWS tooling, the architecture delivered definitive improvements. The pipeline achieved over 95% accuracy on known-answer benchmark workloads and drove a 40% reduction in support tickets related to instance selection. Furthermore, the P50 response time sat at 1.2 seconds, making it fast enough for real-time interactive use.
Lessons Learned
First, do not skip the retrieval layer. Pure generation by prompting the LLM with all instance metadata failed due to context window limits and hallucinations, where the LLM would confidently describe instance specifications that did not exist.
Second, classification before retrieval compounds gains. Each layer of precision improvement multiplies the efficiency of the next, and getting the workload intent right first was well worth the slight latency overhead.
Third, chain-of-thought is non-negotiable for trust. Customers did not just want raw recommendations; they wanted to understand the underlying mechanics of the decision. An answer without reasoning was treated with the same skepticism as a magic eight-ball.
Finally, structured output parsing beats free-form text. We eventually moved to JSON-mode outputs with a strictly defined schema, which made downstream processing and UI rendering far more reliable.
This pattern of classifying, retrieving, generating, and reasoning has become a standard approach in our internal AI tooling. I will write more about applying this exact framework to other complex AWS infrastructure problems in future posts.