Protect critical WSL processes under heavy memory load with cgroup#40519
Protect critical WSL processes under heavy memory load with cgroup#40519chemwolf6922 wants to merge 10 commits into
Conversation
Co-authored-by: Copilot <copilot@github.com>
…es-with-memory-cgroup
…es-with-memory-cgroup
…es-with-memory-cgroup
|
If we want this change. Please help determine the reserved memory size. 128M might be too much. |
There was a problem hiding this comment.
Pull request overview
This PR aims to improve WSL2 reliability under heavy memory pressure by placing user/workload processes into a cgroup v2 with a memory cap, while keeping critical WSL system processes outside that cap to reduce the chance of catastrophic failures after OOM events.
Changes:
- Add cgroup path constants for a
wsl-usercgroup and itscgroup.procs/memory.maxcontrol files. - Create and configure the
wsl-usercgroup at mini_init startup, settingmemory.maxtototalram - 128MB. - Move key workload processes (session leaders, boot command, systemd-spawned workload) into
wsl-userby writing0tocgroup.procs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/linux/init/util.h | Adds constants for the wsl-user cgroup paths. |
| src/linux/init/util.cpp | Moves create-process children into wsl-user cgroup before exec. |
| src/linux/init/main.cpp | Adds cgroup setup routine and invokes it after mounting cgroup2. |
| src/linux/init/init.cpp | Moves session leaders and systemd into wsl-user cgroup. |
| src/linux/init/config.cpp | Moves boot command into wsl-user cgroup. |
Comments suppressed due to low confidence (2)
src/linux/init/main.cpp:3944
- Enabling the memory controller in cgroup v2 via /sys/fs/cgroup/cgroup.subtree_control generally requires the parent cgroup to have no internal processes ("no internal process" rule). At this point mini_init is in the root cgroup, and wsl-user is created before enabling the controller, so the write is likely to fail (e.g., EBUSY) and the feature becomes a no-op. Consider creating a dedicated top-level cgroup hierarchy (e.g., move system processes into a sibling cgroup and leave the root empty), and enable the controller before creating/using children so the memory controller is actually available.
if (UtilMkdir(WSL_USER_CGROUP_PATH, 0755) < 0)
{
LOG_ERROR("Failed to create wsl-user cgroup directory {}", errno);
return;
}
if (WriteToFile(CGROUP_MOUNTPOINT "/cgroup.subtree_control", "+memory") < 0)
{
LOG_ERROR("Failed to enable memory controller {}", errno);
return;
}
src/linux/init/init.cpp:1265
- This cgroup move is relied on to ensure session leaders are subject to the memory cap, but the return value from WriteToFile is ignored. If it fails, the session leader will stay in the root cgroup and can still starve system processes. Consider checking the return and logging a warning/error (or propagating failure) so the protection isn't silently skipped.
"SessionLeader", [ListenSocket = std::move(ListenSocket), &Channel, &Config, Mask = Config.Umask, SocketAddress]() {
// Move session leader into the memory-limited user cgroup.
WriteToFile(WSL_USER_CGROUP_PROCS, "0");
…es-with-memory-cgroup
| #define syscall_arch (offsetof(struct seccomp_data, arch)) | ||
|
|
||
| constexpr auto c_trueString = "1"; | ||
| constexpr size_t c_systemReservedMemory = 128 * 1024 * 1024; // 128MB reserved for WSL system processes |
There was a problem hiding this comment.
I’m not super familiar with this setting, I assume this is just a minimum? How did you pick this value? Is there some other functionality to mark processes as critical another way to protect against cpu starvation as well?
There was a problem hiding this comment.
Yes this acts as a reserved value for the critical wsl processes. Those processes memory usage can go over this value.
The value is arbitrarily picked. I need inputs from the team to pick a more based value. 128MiB is likely too large.
From my personal experience (limiting a lxc's cpu usage), using cgroup cpu limit is not a good idea. It will greatly hammer the actual cpu performance. I'll do some test in wsl to see if that's also true here.
There was a problem hiding this comment.
For the magic number. I calculated the sum of PSS of all processes in the root cgroup namespace. And it's around 9068KB with one distro (with wslg) and the debug shell open.
PSS_KB PID NAME CGROUP
499 1 mini_init 0::/
174 281 DebugShell 0::/
1269 284 login 0::/
638 289 chronyd 0::/
1066 296 GnsEngine 0::/
723 298 init 0::/
808 300 init() 0::/
973 306 init-systemd(Ub 0::/
430 307 init-watcher 0::/
759 351 init 0::/
1729 3836 bash 0::/
root@feng-workstation [ ~ ]#
And it increases to 9140KB with two distro (with wslg) and the debug shell open. Not much increase since most of the pages are shared.
PSS_KB PID NAME CGROUP
349 1 mini_init 0::/
140 281 DebugShell 0::/
1193 284 login 0::/
585 289 chronyd 0::/
795 296 GnsEngine 0::/
547 298 init 0::/
530 300 init() 0::/
609 306 init-systemd(Ub 0::/
301 307 init-watcher 0::/
535 351 init 0::/
1686 3836 bash 0::/
530 6336 init() 0::/
619 6341 init-systemd(De 0::/
305 6342 init-watcher 0::/
416 6370 init 0::/
So 128MiB is likely too much. 16MiB is enough and maybe 24 or 32MiB for some head room.
One thing to note is that since we are not providing isolated cgroup views to the distros, any process can move them out of the wsl-user namespace. One example is the Windows defender for endpoint. It moves itself to the /mde namespace. And it takes ~200MB of RAM.
There was a problem hiding this comment.
For the CPU performance impact. I did the following test on a nproc=32 machine:
cpu.max = max 100000 (default)
Geekbench: single: 2917, multi: 17046
Compiling the kernel -j33: 7m28.953s
cpu.max = 3190000 100000 (reserve 0.1 cpu for critical processes)
Geekbench: single: 2929, multi: 17050
Compiling the kernel -j33: 7m25.078s
My system is not clean so the difference is within margin of error. I'm not seeing the performance degradation I experienced before.
I'll update the cgroup setting to add the cpu reservation.
…es-with-memory-cgroup
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
src/linux/init/init.cpp:1272
- This cgroup move is treated as non-critical but logs failures with LOG_ERROR. To avoid noisy/false errors, consider downgrading to LOG_WARNING/LOG_INFO and stating the impact (session leader not placed in wsl-user cgroup).
// Move session leader into the memory-limited user cgroup.
if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0)
{
// Non-critical.
LOG_ERROR("Failed to move session leader into user cgroup, {}", errno);
}
src/linux/init/init.cpp:2413
- This cgroup move is non-critical but uses LOG_ERROR on failure. Consider logging at warning/info level instead and clarifying that memory protection for systemd (and its subtree) is disabled if the move fails.
// Move systemd into the memory-limited user cgroup.
if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0)
{
// Non-critical.
LOG_ERROR("Failed to move systemd to user cgroup {}", errno);
}
| if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0) | ||
| { | ||
| // Non-critical | ||
| LOG_ERROR("Failed to add process to user cgroup: {}", errno); |
| // Non-critical. | ||
| LOG_ERROR("Failed to move session leader into user cgroup, {}", errno); |
| if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0) | ||
| { | ||
| // Non-critical. | ||
| LOG_ERROR("Failed to move boot command into user cgroup: {}", errno); |
| } | ||
|
|
||
| UtilMount(nullptr, CGROUP_MOUNTPOINT, CGROUP2_DEVICE, 0, nullptr); | ||
| if (UtilMount(nullptr, CGROUP_MOUNTPOINT, CGROUP2_DEVICE, 0, nullptr) != 0) |
Summary of the Pull Request
When under heavy memory load, the critical WSL processes can fail due to various OOM failures. And may end up in an error state after the memory storm ended. Causing "Catastrophic Errors" afterwards.
This PR puts all non-critical processes in the VM into a cgroup with a memory cap. And leaves the critical WSL processes in the root cgorup. Reserving a small amount of memory for these critical processes. So they are not starved to death under heavy memory load caused by user processes.
PR Checklist
Detailed Description of the Pull Request / Additional comments
Validation Steps Performed
Validated by the OP of this issue: #40458 (comment)
The original issue is not deterministic and is hard to repro. Thus, no new test is added.