← All Cloud News Amazon AWS

AWS Bedrock Nova Ultra Inference Pricing Reduced 30% – Now $2.80/M Input Tokens

📅 April 2026⚡ Medium impact🏷️ pricing

📰 The Announcement

On April 8, 2026, AWS reduced Amazon Nova Ultra inference pricing on Amazon Bedrock by 30% across both input and output token dimensions. Input token pricing dropped from $4.00 per million to $2.80 per million, while output token pricing fell from $16.00 per million to $11.00 per million. The reduction applies across all standard Bedrock inference regions including us-east-1, us-west-2, eu-west-1, and ap-southeast-1, with no changes to provisioned throughput or batch inference pricing tiers at launch. Nova Ultra remains AWS's highest-capability foundation model on Bedrock, positioned for complex multi-step reasoning, long-context document analysis, and agentic workflows requiring deep synthesis. The model supports a 300K token context window and is accessible via the Bedrock Converse API, the InvokeModel API, and through Bedrock Agents orchestration layers, with no SKU-level change to the model identifier or API surface.

Placing this pricing update in competitive context reveals a tightening AI inference market. Azure OpenAI's GPT-5 Turbo (as of April 2026) sits at approximately $3.00 per million input tokens and $12.00 per million output tokens via Azure OpenAI Service in East US. Google's Gemini 2.5 Pro on Vertex AI prices at $1.25 per million input tokens (under 200K context) and $10.00 per million output tokens, making it the clear cost leader for input-heavy workloads. Anthropic's Claude 4 Sonnet on Bedrock remains available at $1.50 per million input tokens and $7.50 per million output tokens, undercutting Nova Ultra significantly for general-purpose tasks. Meta's Llama 4 Maverick on Bedrock Marketplace runs at approximately $0.60 per million input tokens for self-managed inference, representing the open-weight budget alternative. Nova Ultra now sits mid-market on input pricing, undercutting GPT-5 Turbo input by 6.7% and undercutting GPT-5 Turbo output by 8.3%, while remaining materially more expensive than Gemini 2.5 Pro on input volume — a 55% premium for input-heavy pipelines remains a real consideration.

This reduction matters most to three customer segments: enterprises running large-scale document intelligence pipelines where context window utilization is high and reasoning depth is critical; financial services and healthcare firms that require AWS-native data residency guarantees and cannot route inference traffic through Google or Azure endpoints; and organizations already deeply embedded in the Bedrock ecosystem using Bedrock Agents, Bedrock Knowledge Bases, or Bedrock Guardrails, where switching inference providers introduces integration re-engineering costs that offset raw token price advantages. The competitive pressure this creates is significant — Google Vertex and Azure OpenAI will face renewed enterprise sales pressure to justify Gemini 2.5 Pro's broader model portfolio and Azure's OpenAI exclusivity against a now more cost-competitive AWS flagship. A likely industry follow-on is accelerated output token price compression across all providers, as output pricing remains the highest absolute cost driver for agentic and chain-of-thought workloads. The primary caveat is vendor lock-in: Nova Ultra is a proprietary AWS model with no portable fine-tuning artifact, meaning customers who optimize workflows around Nova Ultra's specific reasoning behavior cannot easily migrate if AWS raises prices post-competitive stabilization.

Customers processing significant inference volumes should act within the next 30 to 60 days to recalibrate their model selection logic and cost baselines. For a workload running 500 million input tokens and 150 million output tokens monthly, the revised pricing yields a monthly bill of $1,400 for input plus $1,650 for output, totaling $3,050 — down from $4,400 previously, a saving of $1,585 per month or $19,020 annualized. Teams should immediately re-run their model routing logic to evaluate whether workloads currently on Claude 4 Sonnet for long-context tasks (where Nova Ultra's 300K window avoids chunking overhead) now have a more favorable TCO on Nova Ultra at $2.80 input versus the engineering cost of chunking on Sonnet's smaller effective window. Provisioned throughput commitments on Nova Ultra should be evaluated against on-demand rates at the new price point before any 1-month or 6-month commitment is signed, as the on-demand reduction may erode the relative discount of provisioned tiers.

At TCOIQ, our view is that this pricing cut is a defensive move by AWS to prevent enterprise accounts from routing frontier reasoning tasks to Vertex AI's Gemini 2.5 Pro, which still holds a commanding input-token cost advantage. The TCOIQ TCO Calculator at tcoiq.com/tco.html has been updated to reflect the April 8 Nova Ultra pricing and can model multi-model routing scenarios — comparing Nova Ultra, Claude 4 Sonnet, Gemini 2.5 Pro, and GPT-5 Turbo side by side against your actual token distribution. The Inventory Builder at tcoiq.com/inventory.html allows FinOps teams to map existing Bedrock API call volumes by model, region, and application tier, creating the baseline needed to quantify real savings before committing to architectural changes. For organizations evaluating whether to consolidate inference on Bedrock or adopt a multi-cloud inference strategy, TCOIQ's AI Migration Assessment provides a structured 2-week analysis of workload-to-model fit, latency requirements, and total cost including egress and API gateway overhead. The single most impactful next step for any team spending over $5,000 per month on Bedrock inference is to run the TCOIQ TCO Calculator with your current token volume split across models and let the platform surface whether Nova Ultra, Sonnet, or a hybrid routing policy delivers the lowest 12-month cost.

💰 TCOIQ Cost ImpactCustomers running 500M input and 150M output tokens monthly on Nova Ultra save $1,585/month ($19,020/year) at the new $2.80/M input and $11.00/M output pricing versus the prior $4.00/$16.00 rates.

📊 Why It Matters · Impact Analysis

The 30% Nova Ultra price reduction primarily benefits enterprises running long-context reasoning pipelines on Bedrock who previously found Nova Ultra cost-prohibitive relative to Claude 4 Sonnet or Gemini 2.5 Pro. Financial services, healthcare, and public sector customers with strict AWS data residency requirements gain the most, as they cannot freely route to Vertex AI or Azure OpenAI without compliance re-certification. The cut increases competitive pressure on Google and Azure to respond with output token reductions, particularly as agentic workloads drive output volume disproportionately higher than input. Key caveats include continued lock-in risk on a proprietary model with no portable fine-tuning, regional availability gaps outside tier-1 AWS regions, and the fact that Gemini 2.5 Pro still undercuts Nova Ultra by 55% on input pricing for volume-heavy, low-output pipelines — making Nova Ultra competitive only when reasoning depth and context window utilization justify the premium.

✅ What You Should Do

  • Re-run your Bedrock model routing logic immediately: for workloads consuming over 200M input tokens per month with context lengths above 100K tokens, model whether Nova Ultra at $2.80/M input now delivers lower TCO than Claude 4 Sonnet with chunking overhead factored in.
  • Recalculate your monthly Bedrock invoice baseline using the April 8 pricing — a 500M input / 150M output monthly workload saves $1,585/month ($19,020/year); file this as a realized savings event in your FinOps tracking system before Q2 close.
  • Audit any pending Nova Ultra provisioned throughput commitments: with on-demand input tokens now at $2.80/M, verify the effective discount of 1-month or 6-month provisioned tiers still exceeds 15% before signing — the relative value of commitments narrows when on-demand rates drop.
  • Compare Nova Ultra at $2.80/M input against Gemini 2.5 Pro at $1.25/M input for any pipeline where output-to-input token ratio is below 0.3 — at that ratio, Vertex AI still wins on total cost despite integration overhead, and a dual-cloud inference architecture may be warranted.
  • Set a 90-day pricing alert on Claude 4 Sonnet and Gemini 2.5 Pro via your cloud cost management tool — Nova Ultra's cut signals a pricing war is underway and both Google and Anthropic are likely to respond within one to two quarters, potentially shifting the optimal model selection again.
  • If your organization spends over $10,000/month on Bedrock inference across multiple models, initiate a formal model portfolio review within 30 days using token distribution data by application tier to identify which workloads should migrate to Nova Ultra, which should stay on Sonnet, and which warrant evaluation on Vertex AI.

🎯 TCOIQ Recommendation

TCOIQ views this Nova Ultra price reduction as a structurally important but incomplete signal — it closes the gap with Azure GPT-5 Turbo but does not yet make Nova Ultra the cost-optimal choice against Gemini 2.5 Pro for input-dominant workloads, and customers who anchor to this price without modeling their specific token ratio risk over-paying by 20 to 40%. The TCOIQ TCO Calculator at tcoiq.com/tco.html now reflects updated April 2026 Bedrock pricing and supports side-by-side multi-model scenarios across AWS, Azure, and GCP. The Inventory Builder at tcoiq.com/inventory.html enables FinOps teams to establish accurate token volume baselines by model and region before making routing or commitment decisions. For teams spending over $5,000 per month on Bedrock, the immediate next step is to load your current token distribution into the TCOIQ TCO Calculator and run a 12-month cost projection comparing Nova Ultra, Claude 4 Sonnet, and Gemini 2.5 Pro against your actual input-to-output ratio.

→ Model this in TCOIQ TCO Calculator