Setting up self-hosted CI runners, I needed to separate runners (processes) into varying resource-intensity classes. Without containers I had to find another method; cgroups
came to the rescue (by the way: this is what Docker uses under the hood when you add the --cpus
flag).
Control groups (cgroups
) are a core mechanism in Linux used to control processes utilisation of the machine’s resources. You can leverage it to limits your apps’ claims over cpu, memory, i/o, and other system resources primitives.
In this post I’m writing about cgroups
v1. There’s a new spec in the kernel (non-experimental since March 2016), v2, but it’s still not as ubiquitous as v1 (e.g. on Ubuntu 20.04, it requires some non-obvious steps to enable).
Setting CPU limits
cgroups
are configured using raw files and directory structure inside /cgroup
dir. You can manage them this way, but there’s a simpler method: cgroup-tools
.
apt-get install cgroup-tools
Create 2 groups: fast & slow.
cgcreate -g cpu:/fast
cgcreate -g cpu:/slow
In cgroups
, cpu shares are relative, and they sum up to 1024
. Let’s configure the two groups to use up to 75% and 25% respectively.
cgset -r cpu.shares=768 fast
cgset -r cpu.shares=256 slow
The cgroups
cpu shares are the upper bound, meaning they won’t be put into effect unless the cpu is under high enough load. For example, the 25%-capped processes will still consume 100% of the cpu if it’s available. Once there are other demanding tasks, it will be scheduled down to leave the other 75% to them.
Alright, now let’s test it out!
We need to produce a high stress on the cpu, in a controlled fashion. There’s a great utility doing precisely this – stress
.
apt-get install stress
Since on my testing machine I have 32 cpus, in order to see the results clearly, I need to cause stress to them all.
cgexec -g cpu:fast stress --cpu 32
cgexec -g cpu:slow stress --cpu 32
Now, a quick glance at htop
shows that our processes are at approximately 75% and 25% load, respectively. Check it out in the demo below.
Demo
What happened there? Let me explain step by step:
- Idle server with 32 cpus.
- Run the slow 25% cpu group process (
cgexec -g cpu:slow stress --cpu 32
). - It takes 100% cpu since there nothing else competing for resources.
- Run the fast 75% cpu group process (
cgexec -g cpu:fast stress --cpu 32
). - Now, we see the usage converging to 75/25 split (it’s not ideal though).
Voilà! We now have 2 separate process groups, capped at 75% and 25% cpu usage.
Further reading: