[DTrace-devel] [PATCH v2] libproc: make Psystem_daemon() detect modern systemd properly

Nick Alcock nick.alcock at oracle.com
Tue Jul 15 19:09:55 UTC 2025


Psystem_daemon() is used when carrying out shortlived grabs to detect
whether a process is too risky to carry out invasive grabs of (you wouldn't
usually want to stop syslogd or, God forbid, try to ptrace PID 1, unless
explicitly requested via -p: the process just coming up in routine probe
firing is not enough).

This has two code paths: a reliable one for systemd systems (which checks to
see if the process is in the system slice, which contains precisely and only
system daemons), and an unreliable one for other systems (which does the old
Unix approach of consdering anything in the user uid range or with a TTY or
with open standard FDs to TTYs to be not system daemons, and everything else
to possibly be one).

We were checking to see if a system was systemd by looking for the systemd
cgroup hierarchy name in any of the victim process's cgroups.  This was
reliable back in the days of cgroups v1, but alas in v2 where systemd runs
all the cgroups if it runs any and there are no longer multiple hierarchies,
systemd no longer names its cgroups this way and the test fails, causing us
to fall back to the unreliable pre-systemd approach.

Use a more reliable approach to detect systemd, the same approach used by
sd_booted() in libsystemd; check for the existence of the
/run/systemd/system directory.  Fix slice detection to work in the absence
of a systemd hierarchy name (but keep it working when a hierarchy name
*is* present, for older systems), and everything else works unchanged.

We also arrange to fall back to the old code for any processes that are
entirely outside of systemd management: this covers kernel threads,
the occasional process that is part of systemd itself, and also processes
running using Delegate= to give over their subtree's cgroup management to
something else.

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
---
 libproc/Pcontrol.c | 101 +++++++++++++++++++++++++++++++--------------
 1 file changed, 70 insertions(+), 31 deletions(-)

OK, this doesn't regress with stdin coming from /dev/null on any systemd
platform I've tried it on, old (cgroups v1) or new.  (Non-systemd, we will
of course mistake most of the tests for system daemons and fail.  Don't
run the testsuite noninteractively on such systems.)

diff --git a/libproc/Pcontrol.c b/libproc/Pcontrol.c
index 7d9b5055f8201..b5c4e27ef9d29 100644
--- a/libproc/Pcontrol.c
+++ b/libproc/Pcontrol.c
@@ -2927,10 +2927,26 @@ Psystem_daemon(pid_t pid, uid_t useruid, const char *sysslice)
 	int fd;
 
 	/*
-	 * If this is a system running systemd, or we don't know yet, dig out
-	 * the systemd cgroup line from /proc/$pid/cgroup.
+	 * If we don't know if this systemd is running systemd, find out.
 	 */
-	if (systemd_system != 0) {
+	if (systemd_system < 0) {
+		struct stat st;
+
+		if (stat("/run/systemd/system", &st) < 0 ||
+		    !S_ISDIR(st.st_mode))
+			systemd_system = 0;
+		else
+			systemd_system = 1;
+		_dprintf("systemd system.\n");
+	}
+
+	/*
+	 * If this is a system running systemd, dig out the systemd cgroup line
+	 * from /proc/$pid/cgroup.
+	 */
+	if (systemd_system) {
+		int found = 0;
+
 		snprintf(procname, sizeof(procname), "%s/%d/cgroup",
 		    procfs_path, pid);
 
@@ -2941,47 +2957,70 @@ Psystem_daemon(pid_t pid, uid_t useruid, const char *sysslice)
 		}
 
 		while (getline(&buf, &n, fp) >= 0) {
+			/*
+			 * cgroups v2: only one line, 0::-prepended, slice
+			 * name always on that line.
+			 */
+
+			if (strncmp(buf, "0::", strlen ("0::")) == 0 &&
+			    strstr(buf, ".slice/") != NULL) {
+				found = 1;
+				break;
+			}
+
+			/*
+			 * cgroups v1: find the line with the name=systemd
+			 * controller notation.
+			 */
 			if (strstr(buf, ":name=systemd:") != NULL) {
-				systemd_system = 1;
+				found = 1;
 				break;
 			}
 		}
 		fclose(fp);
-		if (systemd_system < 0)
-			systemd_system = 0;
-	}
 
-	/*
-	 * We have the systemd cgroup line in buf.  Look at our slice name.
-	 */
-	if (systemd_system) {
-		char *colon = strchr(buf, ':');
-		if (colon)
-			colon = strchr(colon + 1, ':');
+		/*
+		 * We have our slice's cgroup line in buf.  Extract the slice
+		 * name, skipping over the hierarchy number and controller
+		 * fields.
+		 */
+		if (found) {
+			char *colon = strchr(buf, ':');
+			if (colon)
+				colon = strchr(colon + 1, ':');
 
-		_dprintf("systemd system: sysslice: %s; colon: %s\n",
-		    sysslice, colon ? colon : "(not found)");
-		if (colon &&
-		    (strncmp(colon, sysslice, strlen(sysslice)) == 0)) {
+			_dprintf("systemd system: sysslice: %s; colon: %s\n",
+				 sysslice, colon ? colon : "(not found)");
+			if (colon &&
+			    (strncmp(colon, sysslice, strlen(sysslice)) == 0)) {
+				free(buf);
+				_dprintf("%i is a system daemon process.\n", pid);
+				return 1;
+			}
 			free(buf);
-			_dprintf("%i is a system daemon process.\n", pid);
-			return 1;
+			return 0;
 		}
-		free(buf);
-		return 0;
+		/*
+		 * No idea: this is probably a kernel thread or something
+		 * else entirely outside of systemd management or delegated
+		 * via Delegate=: at any rate, a system daemon.  We can fall
+		 * back to the old mechanism in this situation.
+		 */
+		_dprintf("%i: probably non-systemd: delegated?\n", pid);
 	}
 	free(buf);
 
 	/*
-	 * This is not a systemd system -- we have to guess by looking at the
-	 * process's UID, controlling terminal, and the TTYness and/or location
-	 * of the files pointed to by its stdin/out/err.  (i.e. we first
-	 * consider whether something may be a system daemon by consulting its
-	 * uid range and controlling TTY, then try to rule it out by looking for
-	 * open fds to TTYs and regular files outside particular subtrees.)  (As
-	 * a consequence of these rules, a process with no standard streams at
-	 * all is considered a system daemon -- this is a cheap way of catching
-	 * kernel threads.)
+	 * This is not a systemd system, or we can't extract the relevant
+	 * slice info from it -- we have to guess by looking at the
+	 * process's UID, controlling terminal, and the TTYness and/or
+	 * location of the files pointed to by its stdin/out/err.  (i.e. we
+	 * first consider whether something may be a system daemon by
+	 * consulting its uid range and controlling TTY, then try to rule it
+	 * out by looking for open fds to TTYs and regular files outside
+	 * particular subtrees.)  (As a consequence of these rules, a
+	 * process with no standard streams at all is considered a system
+	 * daemon -- this is a cheap way of catching kernel threads.)
 	 */
 	if ((Puid(pid) > useruid) || Phastty(pid))
 		return 0;
-- 
2.48.1.283.g18c60a128c




More information about the DTrace-devel mailing list