Docker / Kubernetes Resource Calculator

Calculate container CPU and memory requests, limits, QoS class, and node packing for Kubernetes deployments.

Section A — Container Requests & Limits

Memory Request (MiB)

Memory Limit (MiB)

CPU Request (millicores)

CPU Limit (millicores)

Section B — Node Packing

Uses the requests from Section A. Fill those in first, then enter your node specifications below.

Node Total CPU (cores)

Node Total RAM (GiB)

Kubelet Overhead (%)

Number of Replicas

How node packing works

Kubernetes schedules pods based on requests, not limits. The scheduler subtracts kubelet and OS overhead from total node resources, then divides by the per-pod CPU and memory request to find the maximum number of pods. The resource that runs out first is your bottleneck.

Published: April 2026 | Author: TriVolt Editorial Team

CPU and Memory Requests vs Limits in Kubernetes

Every container in Kubernetes can declare two separate numbers for CPU and memory: a request and a limit. These two values serve fundamentally different purposes, and confusing them is one of the most common causes of mysterious performance problems and pod evictions.

A request is a scheduling guarantee. When you set memory: 256Mi as a request, you are telling the Kubernetes scheduler "this pod needs at least 256 MiB of RAM to run — only place me on a node that has 256 MiB free." The scheduler uses requests exclusively when deciding where to place pods. Think of a request like a hotel room reservation: the hotel guarantees the room will be available when you arrive, even if you end up only using half of it.

A limit is a runtime enforcement ceiling. If you set memory: 512Mi as a limit, the Linux kernel will kill your container with an OOMKill signal the instant it tries to allocate beyond 512 MiB — regardless of how much free memory exists on the node. For CPU, the kernel uses bandwidth throttling rather than killing: the process is simply paused when it exceeds its CPU quota. Think of a limit like the maximum number of guests a hotel room can legally accommodate; exceeding it triggers an enforcement action.

Practically, this means requests affect where your pod lands, while limits affect what happens to it at runtime. Setting requests too low causes over-subscription: the scheduler places more pods on a node than it can comfortably run, leading to memory pressure and evictions. Setting limits too low, especially for JVM applications with just-in-time compilation, causes unnecessary throttling and OOMKills.

Kubernetes QoS Classes Explained

Kubernetes assigns every pod a Quality of Service (QoS) class based solely on how requests and limits are configured. This class determines eviction priority when a node runs low on memory.

Guaranteed pods have requests equal to limits for both CPU and memory on every container. These pods are never evicted for memory pressure — the kubelet will evict BestEffort and Burstable pods first. Guaranteed is the right choice for production databases like PostgreSQL or MySQL, message brokers, and any stateful workload where a sudden restart causes data inconsistency or lost in-flight transactions.

Burstable pods have at least one request set, but requests and limits are not equal (or limits are absent on some containers). These pods can use memory beyond their request if the node has spare capacity, but they are candidates for eviction when the node is under pressure. Burstable is appropriate for applications with variable workloads — a web API that is usually idle but can spike during batch jobs.

BestEffort pods have no requests or limits set at all. The scheduler places them anywhere, and they are always the first to be evicted. BestEffort is only appropriate for truly disposable batch jobs where an eviction and re-run is acceptable.

The practical recommendation: run your stateful workloads as Guaranteed by setting requests equal to limits. Run your stateless APIs as Burstable with a memory limit set at 2–3× the request. Never run anything in production as BestEffort.

CPU Throttling and the Linux CFS

CPU limits in Kubernetes are enforced through the Linux Completely Fair Scheduler (CFS) bandwidth control mechanism. The kernel divides time into periods of 100ms by default. Each period, a container may consume CPU time up to its quota. If a container with a 500m (0.5 CPU) limit tries to use more than 50ms of CPU time in a 100ms period, the kernel pauses it for the remainder of that period — even if other CPUs are sitting completely idle.

This creates a well-known problem: a CPU-limited application can appear to be throttled even on a lightly loaded node. The kernel does not look at overall utilisation — it enforces the quota rigidly within each 100ms window. This is particularly damaging for Java and Go applications that use garbage collection, because GC pauses already compete with application work within a single period.

The symptom is a Java application with plenty of headroom on the node, responding slowly and logging "request took 250ms" for endpoints that should complete in 20ms. The pauses are invisible in application metrics but visible in container CPU throttling metrics — specifically the container_cpu_cfs_throttled_seconds_total metric in Prometheus. If your throttle ratio is above 25%, your CPU limit is too low for the workload.

A reasonable starting point: set CPU requests to your steady-state observed usage (p50), and either remove CPU limits entirely for latency-sensitive services, or set them at 2–4× the request to allow burst. Monitor throttling metrics before tightening.

OOMKill: Why Your Container Keeps Restarting

When a container exceeds its memory limit, the Linux kernel's Out-Of-Memory (OOM) killer terminates the process immediately. Kubernetes reports this as a CrashLoopBackOff with exit code 137 (128 + SIGKILL). The container is restarted according to its restart policy, often only to be killed again within seconds — hence the crash loop.

OOMKill is not a graceful shutdown. There is no chance for the application to flush buffers, commit a transaction, or write a log entry. A PostgreSQL container killed this way may require crash recovery on the next start. A Java application mid-GC will leave the heap in an inconsistent state that is simply discarded.

Common causes of unexpected OOMKill include: setting a memory limit far below actual working set (underestimating JVM non-heap, thread stacks, mapped files), a memory leak that is slow enough to pass initial testing but accumulates over days, and burst traffic that temporarily doubles in-flight request memory.

The memory limit/request ratio shown by this calculator is a useful signal. A ratio above 4× means the container is scheduled onto a node as if it only needs a fraction of what it might actually consume — which increases the risk that multiple containers burst simultaneously and exhaust the node. A ratio of 1× (Guaranteed QoS) eliminates that risk entirely. For most stateless APIs, a ratio of 2–3× is a reasonable balance between scheduling efficiency and OOMKill safety.

Kubelet Overhead and Node Allocatable

A 4-core, 8 GiB node does not give you 4000 millicores and 8192 MiB for your pods. Kubernetes reserves resources for the kubelet agent, container runtime (containerd or Docker), operating system processes, and kernel memory. The remaining capacity — what is actually schedulable — is called Node Allocatable.

The allocatable formula is: Allocatable = Capacity − kube-reserved − system-reserved − eviction-threshold. Typical managed Kubernetes services (EKS, GKE, AKS) reserve 10–15% of memory and a few hundred millicores of CPU. A 10% default overhead is a reasonable starting assumption for capacity planning, but check your specific cluster configuration — a node running a DaemonSet-heavy observability stack may have 20–25% effectively reserved.

You can inspect the actual allocatable capacity with kubectl describe node <node-name>, which shows the Allocatable section alongside the total Capacity. Subtract running pod requests (visible in kubectl top nodes) to find remaining headroom.

Right-Sizing in Practice

Setting requests and limits from theory alone is a starting point, not a destination. The correct workflow is: deploy with generous initial values, observe actual consumption, then tighten.

kubectl top pods --containers shows current CPU and memory usage, but only a snapshot. For reliable sizing, collect p95 and p99 memory usage over at least a week of representative load. A workload that uses 180 MiB at median but 420 MiB during daily batch jobs requires a limit above 420 MiB — not 200 MiB because "it usually uses that much."

Kubernetes provides the Vertical Pod Autoscaler (VPA) to automate this. In recommendation mode (without auto-update), VPA observes actual usage and suggests request/limit values. Running VPA in recommendation mode for a sprint before hardening limits is a practical way to right-size a service you are deploying for the first time.

Worked Example

Scenario: A 3-replica Node.js API. Memory request 256 MiB, memory limit 512 MiB. CPU request 250m, CPU limit 500m. Node: 4 cores, 8 GiB RAM, 10% kubelet overhead.

Step 1 — QoS class: Memory limit (512) ≠ memory request (256), so not Guaranteed. At least one request is set. QoS = Burstable. Memory ratio = 512 / 256 = 2×. Within safe range.

Step 2 — Available CPU: 4 cores × (1 − 0.10) × 1000 = 3600 millicores

Step 3 — Available RAM: 8 GiB × (1 − 0.10) × 1024 = 7373 MiB

Step 4 — Max pods by CPU: ⌊3600 / 250⌋ = 14 pods

Step 5 — Max pods by RAM: ⌊7373 / 256⌋ = 28 pods

Step 6 — Bottleneck: CPU (14 < 28). Max pods per node = 14.

Step 7 — Replicas fit: 3 replicas ≤ 14. Yes, all 3 replicas fit on a single node with 11 pod slots remaining for other workloads.

Disclaimer

This calculator provides estimates for planning and educational purposes. Actual Kubernetes scheduling behaviour depends on your specific cluster version, node configuration, kubelet flags, admission controllers, resource quotas, and installed DaemonSets. Always validate sizing decisions against real workload metrics in a representative environment. The calculator does not account for pod disruption budgets, topology spread constraints, or node affinity rules that may further constrain scheduling.

Related calculators: Server Sizing Calculator · Rack Density Calculator

Also in Data Center

→ DC Cooling Load Calculator — Calculate the cooling capacity required for a given IT load and PUE target. Outputs in kW and Tons of Refrigeration.
→ DC Critical Equipment Sizing — Size all critical power and cooling equipment for a new data center build or upgrade: UPS, generator, transformer, switchgear, cooling, PDUs, and battery. Tier I–IV, N to 2N+1 redundancy, PDF export.
→ DC Efficiency Audit — Enter PUE, power chain efficiency, cooling strategy, redundancy tier, and battery runtime to get an overall efficiency score with recommendations.
→ Power Chain Efficiency — Calculate overall power chain efficiency from utility through UPS, PDU, and cabling to IT equipment.