If your company does AI or machine learning on AWS, you probably already know the pain: the GPU bill. The accelerated instances that run your models are expensive, and their availability is unpredictable. In May 2026, AWS expanded a capability that tackles both problems at once: SageMaker’s Flexible Training Plans, now accessible directly from SageMaker Studio.
Here’s what it is, what it can save you, and why any leader watching their cloud costs spiral because of AI should pay attention.
Training a model or running inference requires GPUs. On AWS, these resources are rented by the hour, and that’s where the trouble begins.
First, the price. GPU instances are among the most expensive in the AWS catalog. To give a concrete sense of scale, a p5.48xlarge instance costs around 55 dollars per hour at standard on-demand rates. Multiply that by the hours of a serious training run, and the bill climbs fast. A team launching an ML project without a cost strategy watches its spend take off, often with no clear visibility into what’s consuming what.
Then, availability. In on-demand mode, you take GPU capacity when it’s available. But that capacity depends on regional supply and current demand, and it can change very quickly. If you stop an instance, nothing guarantees you’ll get the same capacity back later. For a team on a deadline, that’s a real risk: a deployment that fails or slips because the GPU is no longer available. Industry-wide, GPU demand has outpaced supply, making it a scarce resource.
SageMaker’s Flexible Training Plans address both problems at once. The idea is simple: instead of grabbing GPU on an ad-hoc basis and hoping it’s available, you reserve in advance the exact capacity you need, for a defined period.
In exchange, you pay far less. According to AWS’s official documentation, SageMaker training plans are priced 70 to 75% below on-demand rates. On a cost center as heavy as GPU, the savings are considerable: potentially tens of thousands of dollars a year for a company doing AI seriously.
The capability covers the full range of ML workloads in SageMaker’s managed environment: training jobs, HyperPod clusters, and now inference endpoints as well, meaning the deployment of models into production. You get access to a wide range of accelerated computing options, including the latest NVIDIA GPUs and AWS Trainium accelerators, without managing the underlying infrastructure.
The system stays flexible, which is what the “Flexible” in the name refers to. You specify your needs (instance type, count, start date, duration), and the system offers available plans with their total price shown upfront. You pay the full plan amount at the time of reservation, the plan then moves to a scheduled state, and becomes active on the planned start date.
One important point to know: you pay the rate in effect when you reserve, even if the plan starts later and prices have changed in the meantime. AWS regularly adjusts its rates based on supply and demand.
The May 2026 update takes a step forward: SageMaker Studio now supports GPU capacity reservation directly through Flexible Training Plans. Before, reserving often meant going through separate APIs or tools. Now it’s accessible from the environment where your data scientists already work. Less friction, and a better chance the reservation is actually used rather than bypassed.
This is where it gets serious for a decision-maker. SageMaker training plans cannot be canceled after purchase. Once the reservation is paid, it expires automatically at the end of the reserved period, whether you used the GPU or not.
Direct consequence: if your instances don’t run continuously throughout the reserved period, the total cost of the reservation can exceed what you’d have paid on-demand. AWS itself states this in its documentation. In other words, the 70 to 75% savings is only real if the reservation is properly sized. Reserve too much, and you pay for unused GPU, with no possibility of a refund.
Flexible Training Plans are just one lever among several. Depending on your workload, other approaches may be more relevant:
Spot instances can reduce GPU costs by up to 90%, but they’re interruptible: AWS can reclaim them at any time. They suit workloads that tolerate interruption, like training with regular checkpoints.
EC2 Capacity Blocks for ML offer rates 40 to 50% below on-demand, self-service, for workloads that run directly on EC2 (where you manage the OS, networking, and orchestration yourself).
Flexible Training Plans are built for SageMaker-managed workloads, when you want AWS to handle provisioning and lifecycle while still securing your capacity.
The right choice depends on three factors: your need for guaranteed availability, your cost model, and your environment (direct EC2 or managed SageMaker). There’s no one-size-fits-all answer.
Here’s the business takeaway.
If your GPU workloads are predictable, you’re leaving money on the table. A team that knows it’ll train models for two weeks next month, or run stable inference in production, has every reason to reserve rather than pay full price. A 70 to 75% gap is too large to ignore.
But the savings are never automatic. Reserving GPU capacity is a financial decision in its own right, and an irreversible one. Reserve too much, and you waste. Reserve too little, and you fall back into unpredictability. Pick the wrong instance type, and you’re stuck. The real savings depend entirely on how accurately the reservation is sized, which rests on a fine understanding of your actual compute needs.
It’s a continuous trade-off, not a one-time setting. Needs evolve, AWS prices move with supply and demand, and today’s best plan isn’t the one you’ll want in six months. Optimizing the GPU bill is permanent FinOps work, not a box to tick.
This announcement fits a clear trend: AWS keeps multiplying cost-optimization levers, especially around AI, because it’s become the fastest-growing cost center for its customers. Between GPU reservations, automatic scaling, and the various savings plans, the FinOps toolbox keeps expanding.
But a toolbox doesn’t do the work. All these levers assume someone in your organization knows they exist, understands your workloads, and makes the right calls at the right time, especially when those decisions are irreversible. That’s precisely what separates a company that pays full price without knowing it from one that controls its cloud bill.
Optimizing GPU costs is exactly the kind of work a dedicated FinOps expertise builds into its daily routine: identifying predictable workloads, sizing reservations as tightly as possible, monitoring actual usage, and arbitrating between the different options. AWS’s capability makes the savings possible. You still have to go get them, without getting it wrong.
SageMaker’s Flexible Training Plans let you reserve GPU capacity in advance for your training and inference workloads, with savings of 70 to 75% versus on-demand rates according to AWS documentation. The May 2026 update makes them accessible directly from SageMaker Studio.
For companies doing AI on AWS, it’s a real opportunity to regain control of an often-uncontrolled cost center. But with one major caveat: these plans aren’t cancelable. The savings only show up if the reservation is properly sized, which requires knowing your true compute needs. The technology opens the door; expertise is what delivers the savings.
A project, an infra drifting, a bill spiraling. Tell us what's blocking you in Dubai, we'll come back with a plan.