8+Projects·
8+Years·
50+Articles

AWS re:Invent 2025

A deep dive into the biggest cloud conference of the year — frontier agents, custom silicon, the Nova model family, and the rise of the renaissance developer.

Las Vegas, NevadaNovember 30 - December 4, 2025

The scale of re:Invent

AWS re:Invent 2025 ran from November 30 through December 4 across multiple venues in Las Vegas — the Venetian, the Wynn, and surrounding convention spaces that take the better part of a day to navigate on foot. Over 63,000 people attended in person. More than two million watched the livestreams. There were 1,900 sessions delivered by 3,500 speakers, and AWS announced over 500 new products and service updates across the week.

The numbers are absurd, and they underscore something real: AWS is not just a cloud provider anymore. It is the infrastructure layer beneath a significant portion of the global economy, now running at $132 billion in annual revenue with 20% year-over-year growth. That is $22 billion added in a single year. When a platform at that scale hosts a developer conference, the announcements ripple through the entire industry.

This year, the ripple had a clear center: AI agents. Not AI as a feature. Not AI as a chatbot. AI agents as autonomous systems that can work for hours or days without human intervention, make decisions, call tools, and complete complex workflows. That was the thesis across all three keynotes, and the announcements backed it up.

Matt Garman's keynote — developers as the heart of AWS

Matt Garman, who took over as CEO from Andy Jassy, delivered the opening keynote on December 2. His central argument was that developers are "the heart of AWS" and that AI agents represent the next inflection point where material business returns from AI investments start to materialize.

The framing was deliberate. After two years of companies pouring money into AI infrastructure and fine-tuning models, the question on everyone's mind was "where is the ROI?" Garman's answer was agents — systems that do not just generate text or code but actually perform tasks, automate processes, and operate autonomously within defined boundaries.

He announced three frontier agents: Kiro, a virtual developer that acts as an autonomous software engineering agent; AWS Security Agent, which handles security testing and vulnerability analysis; and AWS DevOps Agent, which resolves incidents and manages operational tasks. These are not research demos. They are production systems designed to work for hours or days without intervention, making decisions and taking actions within policy guardrails.

The infrastructure announcements were equally significant. Garman unveiled EC2 P6e-GB300 UltraServers with NVIDIA GB300 processors, generally available immediately. He announced AWS AI Factories — dedicated on-premises AI infrastructure that customers can deploy in their own data centers for compliance and data sovereignty requirements. And he gave a sneak peek at Trainium4, the next-generation custom AI chip, promising 6x the FP4 compute performance of Trainium3.

The throughline was clear: AWS is not just providing tools for building AI. It is building the full stack — from custom silicon to autonomous agents — and positioning itself as the platform where agents run at scale.

The Nova model family expands

One of the most substantive announcements was the expansion of Amazon's Nova model family. Nova was introduced at re:Invent 2024 as AWS's homegrown foundation model. This year, it grew into a fleet.

Nova 2 Lite is a fast, cost-effective reasoning model designed for high-throughput tasks where latency and cost matter more than peak capability. It is the model you use when you need millions of API calls per day and the task is well-defined.

Nova 2 Pro targets complex tasks like code generation, multi-step reasoning, and technical analysis. It sits in the capability tier where most production workloads land — powerful enough for real problems, efficient enough to run at scale.

Nova 2 Sonic is a speech-to-speech model with multilingual support. This is not text-to-speech bolted onto a language model. It is a natively multimodal system that processes and generates speech directly, enabling conversational AI applications that feel qualitatively different from transcription-based approaches.

Nova 2 Omni, announced in preview, is the multimodal reasoning model — processing images, text, video, and speech in a unified architecture. The direction is clear: AWS is building toward models that understand the world the way humans do, across multiple modalities simultaneously.

The most interesting announcement in the Nova family, though, was not a model. It was a service.

Nova Forge — open training for custom frontier models

Nova Forge is AWS's answer to the question every enterprise has been asking: "How do I build a model that knows about my business without starting from scratch?"

The traditional approach — fine-tuning a foundation model on proprietary data — works but has limitations. You are constrained by what the base model knows. The training process can cause catastrophic forgetting, where the model loses general capabilities as it learns domain-specific ones. And building a truly custom frontier model from scratch costs tens of millions of dollars.

Nova Forge takes a different approach. It provides access to pre-trained, mid-trained, or post-trained model checkpoints from the Nova training pipeline. Customers inject their proprietary data at the optimal stage — blending it with Amazon's curated datasets in a way that prevents catastrophic forgetting while incorporating domain-specific knowledge deeply into the model's weights.

The pricing is approximately $100,000 annually. That is orders of magnitude cheaper than training a frontier model from scratch and accessible to a much broader set of organizations.

The early results are compelling. Reddit used Nova Forge to improve content moderation precision by 26 percentage points and reduce missed threats by 25%. Nimbus Therapeutics accelerated drug discovery workflows. Sony Group improved legal research accuracy. These are not toy benchmarks — they are production deployments solving real business problems.

AWS is calling this "open training," and it represents a meaningful shift in how enterprises will build AI capabilities. Instead of choosing between a generic foundation model and a prohibitively expensive custom training run, Nova Forge offers a middle path: deep customization at a fraction of the cost.

Bedrock AgentCore — the production layer for agents

If Nova provides the models and Nova Forge provides the customization, Amazon Bedrock AgentCore provides the production infrastructure. This is the platform for building, deploying, and governing AI agents at scale.

The new capabilities announced at re:Invent center on trust and control — the two things that have kept most enterprises from deploying agents beyond proof-of-concept.

Policy Controls let you define boundaries on agent actions using natural language. Instead of writing code to restrict what an agent can do, you describe the boundaries in plain English and the system enforces them in real time with millisecond response times. An agent assigned to handle customer refunds can be constrained to approve refunds under $500 without escalation, for example, and the policy engine enforces that boundary on every action.

AgentCore Evaluations provide continuous inspection of agent quality and real-world performance. This is the observability layer — understanding not just what an agent did, but whether it did it well, where it failed, and how performance changes over time.

Enhanced Memory gives agents episodic functionality. They learn from experiences, retain context across interactions, and improve their decision-making over time. This is the difference between an agent that treats every task as a blank slate and one that accumulates institutional knowledge.

The Strands Agents SDK, which underpins much of the agent development ecosystem, has already crossed five million downloads. AWS is clearly winning the distribution game for agent frameworks, and AgentCore is the production layer that turns SDK experiments into deployed systems.

Nova Act — agents that use computers

Nova Act deserves its own section because it represents something genuinely new. Most AI agents today work through APIs — they call functions, query databases, and interact with structured interfaces. Nova Act builds agents that interact with graphical user interfaces the way humans do.

The system achieves 90% reliability for browser-based UI automation workflows. That number sounds incremental until you consider what it means: an AI agent that can navigate a web application, fill out forms, click buttons, read results, and complete multi-step workflows across arbitrary websites with the same reliability as a competent human operator.

Nova Act uses a custom Nova 2 Lite model trained with reinforcement learning in synthetic environments. It is not just a language model that can describe what it sees on a screen — it is a unified system combining model, SDK, orchestrator, and browser controllers into an integrated platform.

The implications for enterprise automation are significant. Every company has internal tools with web interfaces that lack APIs. Every company has workflows that require navigating third-party websites. Nova Act can automate these workflows without requiring the target application to provide a programmatic interface.

Hertz, one of the early adopters, reported accelerating development velocity by 5x using Nova Act for testing and automation workflows. The no-code playground and human-in-the-loop oversight features make it accessible to teams that do not have deep ML expertise.

Graviton5 — the silicon advantage deepens

AWS's custom silicon strategy is now in its fifth generation with Graviton5. The numbers: 192 Arm Neoverse V3 cores built on TSMC's 3nm process, a 5x larger L3 cache (192MB) compared to Graviton4, and up to 25% higher performance than the previous generation.

The single-socket design deserves attention. By putting all 192 cores on a single die, AWS reduces inter-core latency by roughly a third compared to multi-socket designs. For workloads that are latency-sensitive — which includes most web services, databases, and real-time applications — this is not a marginal improvement. It changes which workloads are viable on ARM architecture.

M9g instances powered by Graviton5 are available in preview, with C9g (compute-optimized) and R9g (memory-optimized) variants planned for 2026.

The customer list tells the story: Airbnb, Atlassian, Adobe, Epic Games. These are not early adopters experimenting with ARM. They are running production workloads at scale on Graviton because the price-performance ratio is better than the x86 alternatives. With Graviton5, that gap widens.

For our work on W0rktree, Graviton is directly relevant. The bgprocess is CPU-intensive — file watching, content-addressable hashing with Blake3, snapshot creation, and diff computation. A 25% performance improvement with better power efficiency means the background process can do more work while consuming fewer resources on the developer's machine. We will be benchmarking W0rktree on M9g instances as soon as they are available.

Trainium3 UltraServers — custom AI silicon at scale

Trainium3 is AWS's first 3nm AI chip, delivering 2.52 petaflops of FP8 compute per chip. The Trn3 UltraServers scale up to 144 Trainium3 chips in a single configuration, delivering 362 petaflops total with 144GB of HBM3e memory and 4.9 TB/s of memory bandwidth per chip.

The performance improvements over Trainium2 are substantial: 4.4x higher compute, 4x greater energy efficiency, and 4x more memory bandwidth. AWS claims the UltraServers deliver 5x more AI tokens per megawatt of power compared to Trainium2.

The efficiency metric is worth lingering on. AI training and inference are increasingly constrained by power and cooling, not compute availability. A 5x improvement in tokens-per-megawatt means organizations can either train 5x more for the same energy cost or achieve the same output at a fifth of the power consumption.

Customers are reporting a 50% reduction in training and inference costs, with some achieving 4x faster inference at half the cost of comparable GPU configurations. Amazon Bedrock is already serving production workloads on Trainium3 inference, which validates the silicon for real-world deployment, not just benchmark numbers.

The Nitro 6 platform and formal verification

One announcement that did not get enough attention was the Nitro Isolation Engine, debuting with Nitro 6 alongside Graviton5.

Nitro has been AWS's hypervisor and security platform since 2017. With Nitro 6, AWS introduced formal verification — mathematical proof that workload isolation guarantees hold under all conditions. This is not testing. It is not code review. It is a mathematical guarantee that one customer's workload cannot access another customer's data, and that even AWS operators cannot access customer data on the Nitro hardware.

Formal verification is standard practice in safety-critical systems — avionics, medical devices, nuclear reactor control. Applying it to cloud infrastructure is ambitious and signals that AWS is treating security as a provable property of the system, not a best-effort outcome.

For anyone building multi-tenant systems — which includes W0rktree's server architecture — this sets a standard. If the cloud provider can mathematically prove tenant isolation, the applications running on that infrastructure should aspire to the same rigor.

S3 Vectors — native vector storage at object-store scale

Amazon S3 Vectors went generally available at re:Invent, and the scale numbers are striking. During the preview, it supported 50 million vectors per index. At GA, that jumped to 2 billion — a 40x increase. A single vector bucket can hold up to 20 trillion vectors across 10,000 indexes.

The performance characteristics are practical: sub-second response times for infrequent queries, around 100ms for frequent queries, and write throughput of 1,000 vectors per second for streaming updates. It is fully serverless — no infrastructure provisioning, no capacity planning.

The cost claim is 90% lower than specialized vector databases. If that holds at scale, it reshapes the economics of RAG (retrieval-augmented generation) applications. Instead of running a dedicated Pinecone or Weaviate cluster, you use S3 Vectors as the storage layer and pay only for what you query.

The integrations with Bedrock Knowledge Base and Amazon OpenSearch are available at GA, which means the common pattern — embed documents, store vectors, retrieve relevant context for LLM prompts — works out of the box without stitching together separate systems.

Lambda durable functions

Lambda durable functions may be the most practically useful announcement for application developers. The concept: Lambda functions that can automatically checkpoint progress, suspend execution for up to one year, and recover from failures without custom state management.

The primitives are clean. "Steps" provide automatic retries and checkpointing — if a multi-step workflow fails at step 3, it resumes from step 3 instead of starting over. "Waits" pause execution without compute charges — the function sleeps until a timer expires or a callback arrives. Operations like parallel execution and external approval callbacks handle the coordination patterns that normally require an orchestration service like Step Functions.

Durable functions launched with support for Python and Node.js runtimes. For teams building agentic workflows — where an AI agent kicks off a process, waits for human approval, calls external APIs, and assembles results over minutes or hours — this eliminates the need for a separate orchestration layer.

Werner Vogels' final keynote — the renaissance developer

The emotional anchor of the week was Werner Vogels' final re:Invent keynote after 14 years as Amazon's CTO. He chose to spend it on the question every developer is asking: "Will AI take my job?"

His answer was nuanced. "Maybe," he said, acknowledging that some tasks will automate and some skills will become obsolete. But he reframed the real question: "Will AI make me obsolete? Absolutely not — if you evolve."

Vogels drew parallels to previous technological transitions: assembly language to compilers, structured programming to object-oriented design, monoliths to microservices. Each transition eliminated some tasks and created new ones. Each time, developers who adapted emerged stronger.

He outlined five qualities of what he called the "renaissance developer":

Curiosity and learning. Continuous experimentation is not optional. The pace of change in AI tooling means the skills you have today may be insufficient in six months.

Social learning. Attending events, joining communities, learning from real-world experiences rather than just documentation. The nuance of how systems behave in production is not captured in tutorials.

Systems thinking. Understanding how your work fits into larger systems. Vogels invoked Leonardo da Vinci — the polymath who understood art, engineering, anatomy, and architecture as interconnected disciplines. The renaissance developer understands frontend, backend, infrastructure, and product as interconnected concerns.

Clear communication. As AI tools generate more code, the ability to specify requirements precisely becomes critical. "If you put garbage in, you get really convincing garbage out." Spec-driven development helps clarify thinking before code is written.

Ownership. Developers must own what they ship, including the code AI generates. Code review becomes a "control point to restore balance" — the moment where human judgment evaluates machine output. Blindly accepting AI-generated code is not development. It is delegation without oversight.

The message was pointed. AI will change what developers do, but it will not replace the judgment, taste, and systems understanding that good engineering requires. The developers who thrive will be the ones who treat AI as a powerful tool and themselves as the accountable owners of the systems they build.

It was a fitting exit for a CTO who spent 14 years arguing that builders are the most important people in the room.

The Swami keynote — agentic AI goes production

Dr. Swami Sivasubramanian's keynote on December 3 was the technical deep dive. Where Garman set the strategic vision and Vogels addressed the human element, Swami walked through the engineering of agentic AI at production scale.

The Strands Agents SDK — already at five million downloads — formed the foundation. Swami showed how the SDK connects to Bedrock AgentCore for production deployment, with the new policy controls and evaluation capabilities providing the governance layer.

The frontier agents got the most demo time. Kiro, the autonomous developer agent, was shown navigating a codebase, understanding requirements from specifications, generating implementation code, writing tests, and iterating on feedback — all without human intervention over an extended period. The demo was carefully scoped to show real capability, not aspirational visions.

The broader thesis was democratization. Tasks that took years now take weeks. Tasks that took weeks now take days. The barrier to building software is dropping, and the bottleneck is shifting from implementation to specification — from "can we build this?" to "should we build this, and exactly what should it do?"

Security announcements

The security announcements were quieter but consequential. AWS launched AI-enhanced security innovations including extended threat detection for EC2 and ECS workloads, malware protection integrated into AWS Backup, and near real-time analytics in Security Hub (now generally available).

AWS Clean Rooms introduced privacy-enhancing synthetic dataset generation for ML model training. The idea: generate synthetic data that preserves the statistical properties of your real data without exposing actual customer information. For teams building AI models on sensitive data — healthcare, financial services, government — this addresses one of the hardest compliance challenges.

The Nitro 6 formal verification announcement, covered earlier, is arguably the most significant security innovation, but it was positioned as an infrastructure feature rather than a security announcement. The fact that AWS can mathematically prove tenant isolation is the kind of foundational security improvement that makes everything built on top more trustworthy.

What it means for the industry

re:Invent 2025 crystallized several trends that will shape cloud computing and software development over the next few years.

Agents are the new application model. The shift from AI-as-feature to AI-as-agent is real and happening faster than most teams are prepared for. The infrastructure — models, orchestration, governance, deployment — is maturing rapidly. Companies that are not experimenting with agentic workflows are falling behind.

Custom silicon is a competitive moat. AWS, Google, and Microsoft are all investing heavily in custom chips. The Graviton and Trainium families give AWS price-performance advantages that commodity hardware cannot match. For compute-intensive workloads, the cloud provider's silicon choices increasingly determine the economics.

Open training changes the AI landscape. Nova Forge's approach — providing access to training checkpoints and letting customers inject proprietary data — is a middle path between using off-the-shelf models and training from scratch. If this model scales, it makes frontier-quality AI accessible to organizations that could never afford to train their own.

Developer experience is the battleground. Lambda durable functions, Kiro, AgentCore — the common thread is reducing the operational complexity of building and running software. AWS is betting that the developer who can build and ship faster on AWS than anywhere else will stay on AWS.

Formal verification is coming to infrastructure. The Nitro 6 announcement is a leading indicator. As multi-tenant systems become more complex and the consequences of isolation failures become more severe, mathematical proof of security properties will shift from nice-to-have to expected.

Personal takeaways

I went to re:Invent primarily to understand where cloud infrastructure is heading and how that informs the W0rktree server architecture. The multi-tenant design, the access control enforcement, the sync protocol — all of these run on cloud infrastructure, and the choices AWS makes about security, isolation, and performance directly affect what we can build.

Three things stood out:

The formal verification work in Nitro 6 validates the approach we are taking with W0rktree's server-enforced access control. If the infrastructure layer can prove isolation mathematees, the application layer should aspire to the same standard. Our ceiling model for access permissions is simpler than formal verification, but the principle is the same: security should be a provable property, not a best-effort one.

Graviton5's single-socket 192-core design is directly relevant to the bgprocess. W0rktree's background process is CPU-bound — hashing files, computing diffs, creating snapshots. More cores with lower inter-core latency means we can parallelize more aggressively without coordination overhead eating the gains.

Werner Vogels' renaissance developer keynote resonated with the design philosophy behind W0rktree. His emphasis on ownership — understanding what your tools build, not blindly accepting output — maps directly to why we made history append-only. If the developer owns the code, the version control system should preserve the true history of how that code came to be. No rewriting. No narrative editing. The record is the record.

re:Invent is overwhelming by design. The value is not in seeing every session — that is physically impossible — but in absorbing the direction of the platform and the problems the cloud provider thinks are worth solving. In 2025, the answer was clear: autonomous agents, custom silicon, and the developer experience that ties them together.