Kubernetes kills your pod? Here's why

Table of contents

Why Kubernetes evicts pods
The Guaranteed class
The side effect better autoscaling
The trade-off
How to set it
Picking the right values

Your pods keep getting killed. Not crashing — killed. One moment they're running fine, the next they're gone and Kubernetes is spinning up replacements. You check the logs and there's nothing useful. The pod just… disappeared.

Turns out Kubernetes killed it on purpose. And if you don't tell it how much memory your app actually needs, it'll keep doing it.

Why Kubernetes evicts pods

Kubernetes runs on nodes — physical or virtual machines that host your containers. Each node has a finite amount of CPU and memory. When a node runs low on resources, Kubernetes has to make a choice: which pods stay, and which ones get evicted to free up space.

The decision comes down to QoS classes — Quality of Service tiers that Kubernetes assigns to every pod based on how you've configured resource requests and limits.

There are three classes:

BestEffort — no resource requests or limits defined. Kubernetes has no idea how much CPU or memory the pod needs. These get killed first.
Burstable — requests and limits are defined, but they're different (e.g., requests: 256Mi, limits: 512Mi). The pod is guaranteed the request amount, but can burst up to the limit. Killed second.
Guaranteed — requests and limits are set to the same value. Kubernetes reserves exactly that amount of resources for the pod. Killed last.

If your pods don't have resource configuration at all, they're running as BestEffort. And when the node hits memory pressure, BestEffort pods are the first to go — no questions asked.

The Guaranteed class

Setting your pod to the Guaranteed class is one line in your deployment config. Define requests and limits for both CPU and memory, and make them identical:

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

That's it. Kubernetes now knows this pod needs exactly 512 MiB of RAM and half a CPU core, and it reserves that capacity when scheduling the pod onto a node. If a node doesn't have 512 MiB available, the pod won't be placed there. And if the node runs into memory pressure later, this pod gets evicted last — only after all BestEffort and Burstable pods are gone.

The side effect — better autoscaling

On managed Kubernetes platforms like EKS, this has a second benefit: the cluster autoscaler pays attention to resource requests when deciding whether to add new nodes.

If your pods are BestEffort (no resource config), the autoscaler sees them as requiring zero resources. Ten pods running on a single node looks fine to it, even if that node is at 90% memory usage. It won't spin up a new node because, from its perspective, there's no unmet resource demand.

But if those same pods are Guaranteed with requests: 512Mi, and the current node doesn't have 512 MiB free, the autoscaler sees a pod that can't be scheduled and adds a new node to accommodate it. Your pods start spreading across multiple nodes instead of piling up on one.

This is particularly rigid on EKS — other Kubernetes providers are a bit more lenient, but EKS strictly follows the scheduler's resource calculations. If you don't define requests, autoscaling won't trigger, and you'll end up with all your pods crammed onto a single node until it runs out of memory and starts evicting things.

The trade-off

The downside of Guaranteed is that you're committing to a specific memory limit. If your app grows and starts using more than what you've configured, the pod gets OOMKilled (out-of-memory killed) instead of being allowed to burst beyond the limit.

With Burstable, you could set requests: 256Mi and limits: 1Gi, giving the app room to spike without getting killed. But you lose the scheduling guarantees — Kubernetes only reserves the 256 MiB request amount, so the pod might end up on a node that doesn't have the full gigabyte available.

Guaranteed means you need to monitor memory usage and bump the limit when your app legitimately needs more. It's a bit more maintenance, but in exchange you get predictable scheduling, protection from eviction, and autoscaling that actually works.

How to set it

In your Kubernetes deployment manifest, add the resources block under containers:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: your-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: your-image:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Apply it:

kubectl apply -f deployment.yaml

Check the QoS class:

kubectl get pod <pod-name> -o jsonpath='{.status.qosClass}'

If it says Guaranteed, you're set.

Picking the right values

Start by looking at what your pods are actually using. Get current memory consumption:

kubectl top pods

Take the highest value you see, add 20-30% headroom, and use that as your request and limit. If a pod is sitting at 400 MiB, set it to 512 MiB. If it's consistently hitting 800 MiB, go with 1 GiB.

For CPU, half a core (500m) is a reasonable starting point for most apps. Bump it if you see CPU throttling in your metrics.

And then monitor. If you see OOMKills in the pod events, the limit is too low — increase it. If memory usage grows over time as you ship new features, update the config to match.

Kubernetes won't kill your pods arbitrarily once they're Guaranteed. But you have to tell it what "guaranteed" actually means.