MongoDB Cluster Calculator


Estimates based on typical MongoDB workloads. Actual requirements may vary.


MongoDB Cluster Sizing Guide

MongoDB is a powerful document database that powers many modern applications. Properly sizing your MongoDB deployment is essential for performance, availability, and cost efficiency. This guide covers replica set configuration, sharding decisions, and comparing MongoDB Atlas with self-hosted deployments.

Understanding MongoDB Architecture

Replica Sets

A replica set is a group of MongoDB instances that maintain the same data:

  • Primary: Receives all write operations
  • Secondaries: Replicate data from primary, can serve reads
  • Arbiter: Participates in elections but holds no data
  • Minimum: 3 members for automatic failover

Sharded Clusters

Sharding distributes data across multiple replica sets:

  • Shards: Each shard is a replica set holding a subset of data
  • Config Servers: Store cluster metadata (3-member replica set)
  • Mongos: Query routers that direct operations to shards
  • Shard Key: Determines data distribution

Memory Sizing

Working Set Concept

The working set is the portion of data and indexes accessed frequently:

  • Ideally, working set fits in RAM
  • WiredTiger cache default: 50% of RAM - 1GB
  • If working set exceeds RAM, performance degrades
  • Monitor cache hit ratio (target > 95%)

RAM Calculation Formula

Recommended RAM = Working Set + Index Size + Overhead

  • Working Set: 10-30% of total data (varies by application)
  • Index Size: Typically 10-25% of data size
  • Overhead: 20-30% for connections, aggregations, sorting

Memory Recommendations by Workload

Workload Type Working Set RAM Multiplier
Hot data only 5-10% of data 1.5x working set
Time-series Recent data period 2x working set
Random access 30-50% of data 1.5x working set
Full scan queries 100% of data Consider sharding

Storage Sizing

Storage Components

  • Data files: Actual document storage
  • Indexes: B-tree structures for queries
  • Journal: Write-ahead log (typically small)
  • Oplog: Replication log (configure size)

Compression Benefits

WiredTiger provides compression:

Compression Ratio CPU Impact
Snappy (default) 2-4x Low
Zlib 4-7x Medium
Zstd 5-8x Low-Medium

Storage Formula

Total Storage = (Data Size / Compression Ratio) + Indexes + Oplog + Headroom

  • Headroom: 20-30% for growth and operations
  • Oplog: Size for desired replication window (hours/days)

When to Shard

Indicators for Sharding

  • Data size exceeds single server capacity
  • Working set doesn't fit in available RAM
  • Write throughput exceeds single replica set capacity
  • Geographic distribution requirements

Sharding Thresholds

Metric Replica Set Limit Consider Sharding
Data Size 1-2 TB comfortable > 2 TB
Write Ops/sec 10,000-20,000 > 20,000
Concurrent Connections 10,000-20,000 > 20,000

Shard Key Selection

Critical factors for choosing a shard key:

  • Cardinality: High number of unique values
  • Write distribution: Avoid hot spots
  • Query patterns: Target queries to specific shards
  • Monotonically increasing: Avoid for shard keys

MongoDB Atlas vs Self-Hosted

MongoDB Atlas Advantages

  • Automated operations (backups, scaling, patching)
  • Built-in monitoring and alerts
  • Global clusters with automatic failover
  • Serverless and dedicated tier options
  • No operational overhead

Self-Hosted Advantages

  • Lower cost at scale
  • Full control over configuration
  • No vendor lock-in
  • Custom hardware optimization
  • Data sovereignty control

Cost Comparison Factors

Factor Atlas Self-Hosted
Infrastructure Included Cloud/Hardware cost
Operations Included Staff time
Backups Included Storage + tooling
Monitoring Included Additional tooling
Support Based on tier Enterprise license

Performance Optimization

Index Strategy

  • Create indexes for common query patterns
  • Use compound indexes strategically
  • Monitor slow queries (> 100ms)
  • Avoid too many indexes (slows writes)
  • Use covered queries when possible

Query Optimization

  • Use explain() to analyze queries
  • Avoid full collection scans
  • Project only needed fields
  • Use aggregation pipeline efficiently
  • Consider read preferences for scaling reads

Write Optimization

  • Batch writes with bulkWrite()
  • Use appropriate write concern
  • Consider write concern trade-offs
  • Pre-split chunks for sharded clusters

Monitoring Essentials

Key Metrics

  • opcounters: Operations per second
  • connections: Current and available
  • replication lag: Secondary behind primary
  • cache utilization: WiredTiger cache usage
  • lock percentage: Time spent waiting
  • queue lengths: Operations waiting

Alert Thresholds

Metric Warning Critical
Replication Lag > 10 seconds > 60 seconds
Cache Dirty % > 5% > 20%
Connections Used > 70% > 90%
Disk Usage > 70% > 85%

Conclusion

Proper MongoDB cluster sizing requires understanding your data patterns, growth projections, and availability requirements. Start with a replica set configuration that provides adequate RAM for your working set, plan storage with compression and growth in mind, and consider sharding only when you truly need horizontal scaling.

Whether you choose MongoDB Atlas or self-hosting depends on your operational capabilities, budget, and specific requirements. Use the calculator above to estimate your cluster configuration and compare costs across deployment options.





Other Calculators