SANDEEP DINESH: When a pod
is scheduled by Kubernetes, it’s important the containers
have enough resources to actually run. If you schedule a large app on
a node with limited resources, it’s possible the node
runs out of memory or CPU and things just stop working. In this episode of
“Kubernetes Best Practices,” let’s take a look at how you
can solve these problems using resource requests and limits. [MUSIC PLAYING] Requests and limits
are the mechanisms Kubernetes uses to
control resources such as CPU and memory. Requests are what the
container is guaranteed to get. If a container
requests a resource, Kubernetes will only
schedule it on a node they can give it that resource. Limits, on the other hand,
make sure a container never goes above a value. The container is only
allowed to go up to limit and then its restricted. Let’s see how these work. So there are two
types of resources– CPU and memory. The Kubernetes scheduler
uses these to figure out where to run your pods. A typical spec for
pod for resources might look something like this. Each container in the pod
can set its own requests and limits, and these
are all additive. CPU resources are
defined in millicores. If your container needs
two full cores to run, you’d put the value 2000m. If your container only
needs 1/4 of a core, you would put a value of 250m. One thing to keep in
mind is that if you put a value in that’s
larger than the core count of your biggest node, then your
pod will never be scheduled. Let’s say you have a pod
that needs four cores, but your Kubernetes cluster is
just comprised of two core VMs. In this case, your pod
will never be scheduled. So unless your app
is specifically designed to take advantage
of multiple cores– things like
scientific computing, and some databases
come to mind– it’s usually a best
practice to keep the CPU request at one or below,
and then run more replicas to scale it out. This gives the system
more flexibility and more reliability. When it comes to CPU limits,
things get interesting. So CPU is consider a
compressible resource. If your app starts
hitting your CPU limits, Kubernetes will start to
throttle your container. This means your CPU will
be artificially restricted, giving your app potentially
worse performance. However, it won’t be
terminated or evicted. Memory resources are
defined in bytes. Normally, you give a
mebibyte value from memory. But you can give it anything
from bytes to petabytes. Just like CPU, if you put
a memory request that’s larger than the amount
of memory on your nodes, the pod will never be scheduled. Now, unlike CPU resources,
memory is not compressible. Because there’s no way
to throttle memory usage, if a container goes
past its memory limit, it will be terminated. It’s important to remember that
you cannot set requests that are larger than the resources
provided by your nodes. You can find the total resources
for GKE VMs at this link. In an ideal world,
the container settings would be good enough to
take care of everything. But the world is a dark
and terrible place. [THUNDER AND LIGHTNING] People can easily forget
to set the resources, or a rogue team can set their
requests and limits very high and take up more than their
fair share of the cluster. To prevent these scenarios,
you can set up resource quotas and limit ranges. After creating a namespace,
you can lock them down using quotas. For example, if you have a
production and development namespace, a common pattern is
to put no quota on production and then put very strict quotas
on the development namespace. This allows production to
take all the resource that it needs in case of a spike
in traffic and development is lockdown. A quota for resources might
look something like this. In this example, you can see
that there are four sections. Let’s go into each one. Requested CPU is the maximum
combined CPU requests that all containers in
the namespace can have. So in this example, you
can have 50 containers with 10m requests, five
containers with 100m requests, or just one container
with 500m’s of requests. As long as the total
requested CPU in the namespace is less than 500m,
we’re good to go. request.memory is the
maximum combined memory requests that all containers
in the namespace can have. So again, in the
above example, you can have 50 containers
with two MB requests, five containers
or 20 MB requests, or just one container with 100– as long as the total requested
memory in the namespace is less than 100 mebibytes. Limits.cpu Is a maximum
combined speed limits that all the containers
and namespace can have. It’s just like requests.cpu,
but for the limits. And then finally, limits.memory
is the maximum combined memory limits that all containers
in the namespace can have. Again, just like the
request,memory but for a limit instead. You can also create a limit
range in your namespace. Unlike a quota, which looks
at the whole namespace, a limit range enforces itself
on individual containers. So this can help prevent
people from creating super tiny or super
large containers inside the namespace. A limit range might look
something like this. So looking at the example,
you can see again there’s four sections. Let’s go into each one. The default section will
set up the default limits for a container in the pod. If you set these values in the
limit range, any containers that don’t explicitly
set these values themselves will get
assigned the default values. The default request section
will set up the default requests for a container in a pod. Again, if you set these
values in the limit range, any containers that don’t
explicitly set these themselves will get assigned
these default values. The max section will set
up the maximum limits that a container
in a pod can set. The default section cannot be
set higher than this value, and the limits in a container
cannot be higher as well. It’s important to note
that if this value is set and the default
section is not set, the max value becomes the
default value as well. The min section will set up the
minimum requests of a container and a pod can set. The default requests section
cannot be lower than this, and requests that are on a
container cannot be lower as well. Again, it’s important to note
that if this value is set and the default requests
section is not set, the minimum value becomes
the default request as well. So at the end of the day,
these resources requests are used by the
Kubernetes scheduler to run your workloads. And it’s kind of
important to understand how this works so you can tune
your containers correctly. So let’s say you want to run
some pods on your cluster. Assuming the pod
specifications are valid, the Kubernetes schedule
will use round robin load balancing to pick a node
to run your workload. So Kubernetes will check if
the node has enough resources to fulfill the request
on the pod’s containers. If it doesn’t, then it’ll
move on to the next node. If none of the
nodes in the system have resources left
to fill the requests, then pods go into
a pending state. By using Google Kubernetes
engine’s features, such as the node autoscaler,
GKE can automatically detect a state and then create
more nodes automatically. And then if there’s an
excess capacity of nodes, the autoscaler can
scale it down and remove nodes to save you money. So Kubernetes schedule these
pods based on the requests. But a limit can be higher
than the requests, right? So this means that
in some scenarios a node can actually
run out of resources. And we call this an
overcommitted state. So when it comes to CPU,
like we said before, Kubernetes will start
to throttle the pods. Each pod will get as
much as it requested, but it might not be
able to go up the limit. We’ll start throttling it down. But when it comes to
memory, Kubernetes has to make some decisions
on which pods to kill and which part to keep until
you free up system resources. Otherwise, the whole
system will crash. So let’s imagine a scenario
where you have a machine and that’s kind of
running out of memory– what will Kubernetes do? So Kubernetes will
look for pods that are using more resources
than they requested. So if your containers have
no requests at all, then by default they’re using
more than they requested, because they requested nothing. So these are prime
candidates for termination. Another prime candidate
are containers that have gone
over their requests but are still under the limit. So if Kubernetes finds
multiple pods that have gone over their
request, then Kubernetes will rank these
pods by priority, and then terminate the
lowest priority pods first. If all the pod have
the same priority, then Kubernetes
terminates the pods that have gone the
most over its request. In very rare
scenarios, Kubernetes might be forced to terminate
pods that are still within their requests. This can happen when
critical system components, like the kubelet or Docker,
start taking more resources than were reserved for them. So while your
Kubernetes cluster might work fine without setting
resource requests and limits, you’re going to start running
into more and more issues as your teams and projects
start to grow larger. Adding requests and limits
to your pods and namespaces only takes a little
extra effort, and it can save you from
running into many headaches down the line. I’ll see you on the next episode
of “Kubernetes Best Practices.” [MUSIC PLAYING]