Kubernetes requests vs limits — scheduling, throttling, and OOMKilled

I'll be honest. For a long time I set Kubernetes requests and limits by copying numbers from another deployment. Round them up, ship it, move on.

It worked until it didn't. One service started getting OOMKilled under load — killed and restarted, over and over — and I had no idea why. The numbers "looked fine".

That's when I finally sat down and learned what these two fields actually do.

They look like the same knob. They are not. One decides where your pod runs. The other decides what happens when your pod misbehaves. And for memory, "what happens" is a lot more brutal than for CPU.

This post is the explanation I wish I'd had before I started guessing.

Two numbers, two completely different jobs

Every container can declare two things, for both CPU and memory:

resources:
  requests:
    cpu: '250m'
    memory: '256Mi'
  limits:
    cpu: '500m'
    memory: '512Mi'

Requests are a promise to the scheduler. Limits are a ceiling enforced at runtime.

That's the whole idea. Everything else follows from it.

Requests — what the scheduler reserves

When you create a pod, the scheduler has to place it on a node. It does this by looking at requests, not at what the pod actually uses.

Think of it as booking a seat. Requests say "this pod needs at least 250m CPU and 256Mi memory". The scheduler finds a node that still has that much unbooked, and places the pod there.

Two things follow from this.

Set requests too low and the scheduler thinks your pod is tiny. It packs it onto a node that's already full of real work. Now your pod fights for resources it never reserved.

Set requests too high and you waste capacity. The node looks full to the scheduler even though the pods are barely using anything. New pods stay Pending because, on paper, there's no room.

A pod that requests more memory than any single node can offer never schedules at all:

kubectl describe pod my-pod
# Events:
#   Warning  FailedScheduling  0/4 nodes are available:
#   4 Insufficient memory.

I once spent half an hour convinced the cluster was broken, when really I'd copied a memory: 8Gi request from a different app onto a node pool that maxed out smaller.

Limits — the ceiling, enforced two ways

Limits are the hard cap. "This container may never use more than this." The catch is that Kubernetes enforces the cap differently for CPU and memory.

This difference is the single most important thing in this post.

CPU over the limit: throttled. CPU is a compressible resource. When a container tries to use more CPU than its limit, the kernel just gives it less time on the processor. The container keeps running. It's slower, but nothing crashes.

Memory over the limit: OOMKilled. Memory is not compressible. You cannot give a process "less memory than it asked for" — the bytes are either there or they aren't. So when a container crosses its memory limit, the kernel's OOM (Out Of Memory) killer kills the process. The container exits with code 137 and restarts.

You cannot throttle memory. You can only kill the process using it.

Here's what that looks like after the fact:

kubectl get pod my-pod
# NAME     READY   STATUS      RESTARTS
# my-pod   0/1     OOMKilled   5

kubectl describe pod my-pod
#   Last State:     Terminated
#     Reason:       OOMKilled
#     Exit Code:    137

A climbing restart count plus OOMKilled almost always means the memory limit is too low for what the app really needs. Not a code bug. A budget that's too small.

CPU throttling is sneakier, because nothing crashes. The app just gets slow under load, and it looks like a performance bug in your code. You can confirm it's throttling with a metric:

container_cpu_cfs_throttled_periods_total

If that's climbing, the kernel is holding your container back. I burned an evening once chasing "slow GC pauses" that turned out to be a CPU limit set too tight.

The .NET angle — your runtime reads the limit

Here's the part that matters if you run .NET in containers, and it's easy to miss.

Modern .NET is container-aware. The garbage collector reads the cgroup memory limit and sizes the managed heap against it. If the runtime knows it has a 512Mi budget, it makes GC decisions that fit inside 512Mi.

If you set no memory limit, the runtime can assume it has the whole node's memory to play with. It sizes the heap for a much bigger budget and grows it happily — until the node runs out of real memory and something gets killed in a messier, less predictable way.

So setting a memory limit isn't just about protecting your neighbours. It tells your own runtime how much room it actually has.

You can make the relationship explicit in the deployment, capping the GC heap as a percentage of the container limit:

containers:
  - name: api
    image: myregistry.azurecr.io/api:1.4.0
    resources:
      limits:
        memory: '512Mi'
    env:
      - name: DOTNET_GCHeapHardLimitPercent
        value: '75' # use 75% of the cgroup memory limit for the GC heap

This leaves headroom for non-heap memory (the stack, native buffers, the runtime itself) so the GC doesn't fill the whole budget and then get OOMKilled the moment something native allocates.

QoS classes — the side effect you didn't choose

Your requests and limits also decide your pod's Quality of Service class, which you never set directly. Kubernetes derives it:

Guaranteed — requests equal limits for every container. Highest priority.
Burstable — requests are set and lower than limits. Middle.
BestEffort — nothing set at all. Lowest.

This matters under node memory pressure. When a node runs low on memory, the kubelet starts evicting pods to save itself. It evicts BestEffort first, Burstable next, Guaranteed last.

So "I didn't bother setting requests" quietly signed your pod up to be the first one thrown overboard.

Before and after

Here's the shape of the change for a typical .NET API.

Before — nothing set, so it's BestEffort and first to be evicted:

# no resources block at all
containers:
  - name: api
    image: myregistry.azurecr.io/api:1.4.0

After — requests near steady-state, memory limit with real headroom above the peak:

containers:
  - name: api
    image: myregistry.azurecr.io/api:1.4.0
    resources:
      requests:
        cpu: '200m'
        memory: '300Mi'
      limits:
        memory: '512Mi' # headroom above measured peak
        # no CPU limit — let it burst, watch throttling instead

Notice there's no CPU limit in the "after". On many workloads, a CPU limit causes more throttling pain than it prevents. Requests still reserve a fair share; leaving the limit off lets the app use spare CPU when it's there.

Pitfalls I actually hit

Memory limit equal to request, thinking it was "safe". That makes the pod Guaranteed, but with zero burst room. A normal traffic spike pushed it past the limit and it got OOMKilled. Lost an afternoon.

CPU limit too low. Looked exactly like slow code and GC pauses. It was CFS throttling. One evening gone.

No limits at all. One leaky pod grew until the node ran out of memory, and the kubelet evicted healthy neighbours to survive.

Requests copied from another app. Pod stuck Pending with "Insufficient memory", which looked like a cluster capacity problem and wasn't.

Forgetting .NET sees the limit. Heap sized for the whole node, then OOMKilled under load once real traffic arrived.

Reading OOMKilled as a bug. Spent time looking for a memory leak when the real fix was a bigger limit.

How to actually pick the numbers

Stop guessing. Measure.

Watch the app at rest and at peak:

kubectl top pod my-pod          # quick live view

# Prometheus, the real working set:
container_memory_working_set_bytes
# CPU throttling, if any:
rate(container_cpu_cfs_throttled_periods_total[5m])

Then:

Requests ≈ steady-state usage. What the app uses when it's just sitting there serving normal traffic.
Memory limit = measured peak + headroom (enough to survive a spike, not so much it hides a leak).
CPU limit = loose, or none. Watch the throttling metric before you tighten it.

Key takeaways

Requests control scheduling. Limits control enforcement. Different jobs entirely.
CPU over the limit gets throttled. Memory over the limit gets killed. No exceptions.
Requests and limits also set your QoS class and your eviction order under pressure.
.NET is container-aware — set a memory limit so the GC knows its real budget.
OOMKilled usually means the limit is too low, not that your code is broken.
Don't guess. Measure at rest and at peak, then set.

What's next

The other thing that makes pods restart is a misconfigured liveness probe — a completely different mechanism from a memory limit, but the symptom (a restarting pod) looks the same. Knowing which one you're dealing with saves a lot of time.

If you set resource numbers by guessing too, I'd start with one service: measure it for a day, set requests and limits from real data, and watch the restart count go quiet. That's the whole win.

I'm working through Kubernetes one confusing corner at a time on LinkedIn — that's the best place to tell me where I got this wrong.