Martin Rylko
  • Services
  • Blog
  • About
  • Contact
  • Get in Touch
Martin Rylko

Senior Cloud Architect & DevOps Engineer. Specializing in Microsoft Azure, IaC, Cloud Security and AI.

Navigation

  • Services
  • Blog
  • About
  • Contact

Collaboration

Looking for an experienced architect for your Azure project? Get in touch.

rylko@cloudmasters.cz

© 2026 Martin Rylko. All rights reserved.

Built in the cloud. Deployed via Azure Static Web Apps.

Home/Blog/Azure Cosmos DB Cost Optimization: 8 Levers to Cut Your RU/s Bill
All articlesČíst česky

Azure Cosmos DB Cost Optimization: 8 Levers to Cut Your RU/s Bill

6/15/2026 6 min
#Azure#Cosmos DB#FinOps#NoSQL#Cost Optimization

Azure Cosmos DB Cost Optimization: 8 Levers to Cut Your RU/s Bill

Cosmos DB has earned its reputation of "great technology, painful invoice". After two years of running CostSentry.AI on Cosmos I have learned eight concrete levers that, pulled together, cut the total bill by 40–60% with no impact on performance or availability.

This article is the distilled version of what I run on every Cosmos DB FinOps audit.

Step 0: Understand What You Are Actually Paying For

Cosmos DB bills for two things:

  1. Provisioned/Consumed RU/s (request units per second) – throughput for database operations
  2. Storage (GB-month) – fixed price, marginal on the total bill

Storage is negligible in 90% of cases. RU/s is the lever that matters. Before optimizing, run a cost breakdown:

az consumption usage list \
  --start-date 2026-05-01 \
  --end-date 2026-06-01 \
  --query "[?contains(consumedService, 'Microsoft.DocumentDB')].{Resource:instanceName, RU:meterDetails.meterName, Cost:pretaxCost}" \
  --output table

You should see how many RU/s each container consumes. That is the starting point for optimization.

Lever 1: Provisioned vs Serverless Decision

ScenarioRecommendationReason
Dev/test, <5 000 RU/s burstServerlessPay per RU consumed, no minimum
Production, >5 000 RU/s sustainedProvisioned (autoscale)Cheaper per-RU at higher throughput
Spike-heavy (5× nominal)Provisioned autoscaleNo throttling, accommodates spikes
Steady predictable loadProvisioned manualLowest per-RU price

For the CostSentry.AI dev/test environment, switching from Provisioned (4 000 RU/s) to Serverless dropped the bill from USD 180 to USD 40 per month. For production we stayed on Provisioned autoscale.

Lever 2: Autoscale Tuning

The default autoscale setting is max 4 000 RU/s. Most dev containers do not need it. Always:

  1. Check the actual peak over the last 30 days in Metrics → Total Request Units
  2. Set max to 1.3× peak (not 4 000, not 10 000 – a concrete number from your data)
resource container 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2024-08-15' = {
  name: 'users'
  parent: database
  properties: {
    resource: {
      id: 'users'
      partitionKey: { paths: ['/tenantId'], kind: 'Hash' }
    }
    options: {
      autoscaleSettings: {
        maxThroughput: 2000  // not the 4000 default!
      }
    }
  }
}

Per-RU autoscale price: ~$0.00012 / RU / hour (vs $0.00008 manual). Choosing max 2 000 instead of 4 000 saves 50% even when idle – autoscale does not bill for unused capacity, but the baseline is 10% of the maximum.

Lever 3: Indexing Policy Optimization

The default indexing policy in Cosmos is /* – it indexes everything. That raises write cost by 20–40% versus an explicit policy. If you only query specific properties:

{
  "indexingMode": "consistent",
  "includedPaths": [
    { "path": "/tenantId/?" },
    { "path": "/createdAt/?" },
    { "path": "/email/?" },
    { "path": "/status/?" }
  ],
  "excludedPaths": [
    { "path": "/*" },
    { "path": "/_etag/?" }
  ]
}

This indexes only 4 fields. Write RU cost drops from ~10 RU per document to ~6 RU per document = 40% saving on writes.

At CostSentry.AI this change on a single container (events, 500k writes/day) dropped cost from USD 80/month to USD 52.

Heads up: Cosmos applies indexing policy changes as a background reindex that can take hours to days on large containers. Monitor via:

// Cosmos SDK – check reindex progress
const { resource: container } = await client
  .database('mydb')
  .container('users')
  .read();
console.log(`Indexing transformation: ${container.indexTransformationProgress}%`);

Lever 4: TTL for Ephemeral Data

If you have data with an expiration (sessions, audit logs, cached query results, expired notifications), set a container-level TTL instead of manual deletion:

resource sessionContainer 'Microsoft.DocumentDB/.../containers@2024-08-15' = {
  name: 'sessions'
  parent: database
  properties: {
    resource: {
      id: 'sessions'
      partitionKey: { paths: ['/userId'], kind: 'Hash' }
      defaultTtl: 86400  // 24 hours in seconds
    }
  }
}

TTL deletion costs 0 RU – Cosmos handles it as part of background maintenance. By contrast, a manual DELETE call costs ~5 RU per document plus the application overhead.

For variable per-document TTLs, set a ttl property on the document:

await container.items.create({
  id: 'session-123',
  userId: 'user-456',
  ttl: 3600,  // this document expires in an hour, not in 24
  data: { ... }
});

Lever 5: Partition Key Design (Cost Driver #1)

A bad partition key is not just a performance problem – it is the single biggest cost driver. A hot partition means:

  • Cosmos has to split the partition (RU/s overhead)
  • Throttling even with high provisioned throughput
  • Cross-partition queries (5–10× more expensive than single-partition)

Rules:

  1. Cardinality: at minimum thousands of distinct values
  2. Uniform distribution: no value should hold more than 5% of all documents
  3. Query-aligned: queries frequently filter by that value

At CostSentry.AI we originally used /customerId as the PK for events. For large customers that meant a hot partition. Refactoring to /customerId-${YYYYMM} (customer + month composite) spread the load evenly and cut cross-partition query cost by 60%.

Lever 6: Bulk Operations for Write-Heavy Workloads

If you write 100+ documents at a time, single-document inserts burn RU. Use bulk:

// Instead of a loop over inserts
const operations = items.map(item => ({
  operationType: 'Create',
  resourceBody: item
}));
 
const response = await container.items.bulk(operations);

Bulk operations share networking overhead and Cosmos batches them – the result is 30–40% lower RU cost per document compared to the loop variant.

Lever 7: Reserved Capacity for Provisioned

Like VM Reserved Instances, Cosmos DB has Reserved Capacity – you pay for a 1Y or 3Y commitment and get a 20–65% discount:

TermDiscount
1-year20%
3-year65%

For a production workload with a stable 10 000 RU/s baseline this means:

Pay-as-you-go:    ~USD 580/month
1-year Reserved:  ~USD 465/month  (-20%)
3-year Reserved:  ~USD 205/month  (-65%)

Heads up: since summer 2024 Microsoft has disabled exchanges for Cosmos DB Reserved Capacity (refund only). If you plan re-platforming inside a 2-year horizon, buy 1-year instead of 3-year.

Lever 8: Multi-Region Replication Audit

The default architect mindset is "for production we turn on multi-region for HA". Cosmos DB bills each region as its own provisioned capacity:

SetupRU/s cost multiplier
Single region (zone-redundant)1.25×
2 regions (read replica)2×
3 regions3×
5 regions5×

For most workloads, a zone-redundant single region (99.995% SLA) is enough. Turn on multi-region only when:

  1. You have an explicit business requirement for geographic redundancy
  2. You have read traffic from multiple continents and a latency requirement <100 ms p99
  3. Compliance requires data residency across multiple regions

At Christie's we found four production Cosmos accounts with a multi-region setup from 2022 nobody questioned anymore. Switching to a zone-redundant single region cut Cosmos invoices by ~EUR 1 800/month combined.

FinOps Audit Checklist for Cosmos DB

StepExpected savingDifficulty
Switch dev/test to Serverless50–80% on non-prodLow
Tune autoscale max to 1.3× peak20–40% on overprovisionedLow
Optimize indexing policy20–40% on writesMedium
Set container TTL for ephemeral data100% on manual deletesLow
Audit partition key cardinality/distribution30–60% on cross-partition queriesHigh
Bulk operations for write-heavy workloads30–40% on bulk write pathsMedium
Reserved Capacity 1Y on stable workload20% on baselineLow
Audit multi-region necessityup to 50% on unnecessary replicasLow

Conclusion

Cosmos DB cost optimization is a craft – no silver bullet, but eight concrete levers with a cumulative effect of 40–60% reduction. At CostSentry.AI, after applying all of them, we dropped the Cosmos bill from USD 380/month to USD 145 with no loss of functionality or SLA.

The key principle: Cosmos is not expensive, it is misconfigured. Microsoft tuned the defaults for "works out of the box", not for "cheapest". Three hours of FinOps audit returns thousands of EUR per year.

Need help with a Cosmos DB FinOps audit? Check out our cloud architecture services or reach out for a NoSQL cost review.

Tags:#Azure#Cosmos DB#FinOps#NoSQL#Cost Optimization
LinkedInX / Twitter

About the author

Martin Rylko

Martin Rylko

Senior Cloud Architect & DevOps Engineer

14+ years in IT – from on-premises datacenters and Hyper-V clustering to cloud infrastructure on Microsoft Azure. I specialize in Landing Zones, IaC automation, Kubernetes and security compliance.

Email LinkedInFull profile

Frequently Asked Questions

When should I switch Cosmos DB from Provisioned to Serverless?▾
Serverless is the right call for workloads under 5 000 RU/s of average consumption with a bursty pattern – dev/test environments, prototypes, low-traffic APIs. For production workloads above 5 000 RU/s, Provisioned (autoscale) is cheaper. At CostSentry.AI we moved the dev tier to Serverless and dropped the Cosmos bill from USD 180/month to ~USD 40 with no impact on the dev team.
What is autoscale and when should it replace manual scaling?▾
Autoscale automatically adjusts provisioned RU/s between 10–100% of the configured maximum based on actual load. It bills the higher of two values: actually consumed RU/s or 10% of the maximum. Pros: no throttling during spikes, no upfront capacity planning. Cons: ~50% higher per-RU/s price than manual. Worth it when your daily variability exceeds 3:1 (e.g. business hours vs night).
How does indexing policy affect cost?▾
Cosmos DB by default indexes every property on every document. That raises write RU/s cost by 20–40% versus an optimized policy. If you only query 3–5 fields, exclude the rest via excludedPaths in the indexing policy. At CostSentry.AI this change dropped our write cost from ~USD 80/month to ~USD 52 for the same throughput.
Is multi-region replication worth it for non-critical workloads?▾
For non-critical workloads, no. Every additional region multiplies the provisioned RU/s total – 5 regions = 5× the cost. Use multi-region only for workloads with an explicit SLA on geographic redundancy or with read traffic from multiple continents. For most enterprise workloads a single zone-redundant region is sufficient (3× RU/s but a 99.995% SLA).

You might also like

Azure FinOps: 7 Steps to Cut Cloud Costs by 30%

Practical Azure FinOps guide – from Cost Management through VM right-sizing to Reserved Instances and auto-scaling. Real savings from enterprise projects.

Read

Azure Reserved Instances and Savings Plans: A Fiscal Year-End Strategy

A complete guide to buying Azure Reserved Instances and Savings Plans before fiscal year-end. Coverage analysis, exchange rules, and real-world math from enterprise projects.

Read

Azure Functions Flex Consumption: When to Replace the Premium Plan in 2026

Flex Consumption is the third path between the Consumption and Premium plans for Azure Functions. A practical breakdown of the pricing model, VNet integration, and when to switch off the Premium plan.

Read