Constrain Process CPU Usage with cgroups (Demo)

Setting up self-hosted CI runners, I needed to separate runners (processes) into varying resource-intensity classes. Without containers I had to find another method; cgroups came to the rescue (by the way: this is what Docker uses under the hood when you add the --cpus flag).

Control groups (cgroups) are a core mechanism in Linux used to control processes utilisation of the machine’s resources. You can leverage it to limits your apps’ claims over cpu, memory, i/o, and other system resources primitives.

In this post I’m writing about cgroups v1. There’s a new spec in the kernel (non-experimental since March 2016), v2, but it’s still not as ubiquitous as v1 (e.g. on Ubuntu 20.04, it requires some non-obvious steps to enable).

Setting CPU limits

cgroups are configured using raw files and directory structure inside /cgroup dir. You can manage them this way, but there’s a simpler method: cgroup-tools.

apt-get install cgroup-tools

Create 2 groups: fast & slow.

cgcreate -g cpu:/fast
cgcreate -g cpu:/slow

In cgroups, cpu shares are relative, and they sum up to 1024. Let’s configure the two groups to use up to 75% and 25% respectively.

cgset -r cpu.shares=768 fast
cgset -r cpu.shares=256 slow

The cgroups cpu shares are the upper bound, meaning they won’t be put into effect unless the cpu is under high enough load. For example, the 25%-capped processes will still consume 100% of the cpu if it’s available. Once there are other demanding tasks, it will be scheduled down to leave the other 75% to them.

Alright, now let’s test it out!

We need to produce a high stress on the cpu, in a controlled fashion. There’s a great utility doing precisely this – stress.

apt-get install stress

Since on my testing machine I have 32 cpus, in order to see the results clearly, I need to cause stress to them all.

cgexec -g cpu:fast stress --cpu 32
cgexec -g cpu:slow stress --cpu 32

Now, a quick glance at htop shows that our processes are at approximately 75% and 25% load, respectively. Check it out in the demo below.

Demo

A demo of htop CPU usage with processes spawned in two cgroups: fast, slow. Look at the CPU%.

What happened there? Let me explain step by step:

Idle server with 32 cpus.
Run the slow 25% cpu group process (cgexec -g cpu:slow stress --cpu 32).
It takes 100% cpu since there nothing else competing for resources.
Run the fast 75% cpu group process (cgexec -g cpu:fast stress --cpu 32).
Now, we see the usage converging to 75/25 split (it’s not ideal though).

Voilà! We now have 2 separate process groups, capped at 75% and 25% cpu usage.