[DTrace-devel] [PATCH] dtprobed: prevent inter-request contamination of state
Kris Van Hees
kris.van.hees at oracle.com
Thu May 25 20:52:29 UTC 2023
On Wed, May 24, 2023 at 03:18:57PM +0100, Nick Alcock via DTrace-devel wrote:
> On fairly rare occasions, one sometimes sees dtprobed saying things like
> this:
>
> 46157: dtprobed: helper size incorrect: expected at least 80, not 0
>
> or
>
> 25815: dtprobed: DOF too small: expected at least 64, not 0
>
> The message varies, but the size is always 0. Whenever this happens
> one object's-worth of incoming DOF is thrown out.
>
> The cause is, it turns out, simple. CUSE ioctls are handled via a
> multistage request/response protocol, where the CUSE server (dtprobed)
> repeatedly calls fuse_reply_ioctl_retry() and returns and is immediately
> called back with the result of extracting a bit more info from the
> client's address space: this process is fast but still involves a
> protocol roundtrip to the FUSE device, so it's not instantaneous. To
> carry state across this call FUSE has a userdata pointer that is shared
> across calls to all the FUSE callbacks.
>
> Now *obviously* CUSE isn't going to interleave the results of this
> request/response process with further requests coming in from new
> clients, right? I mean even the FUSE examples are written assuming that
> this won't happen.
>
> It does, and this problem is, while not terribly common, common enough
> that I'm not the only one who's seen it. You can be halfway through a
> request/response cycle and then some new PID can come blundering in and
> you have to handle that as well, in parallel.
>
> So we can't just pass a single userdata state vector around and have it
> track everything because all our requests are serialized, since they're
> *not* serialized. So switch to managing them by hand: maintain a
> dt_list_t of manually-malloced userdatas, reverse-insertion-sorted by
> pid so that unless the pid wraps the pid we want is almost always at the
> head of the list. We need to clean up this list periodically because
> processes can be killed in the middle of their ioctl() so we can get
> stale stuff in the list: to save a bit of time and make sure the cleanup
> code doesn't rust, *only* free in a periodic cleanup cycle: the cycle
> goes through the list and frees every element with no associated process
> or whose state indicates that it's not in the middle of a
> request/response cycle right now.
>
> For now, clean up every 128 requests -- rare enough that it's
> amortized-free, common enough that even when pids wrap we hardly have to
> traverse any stale ones to get to the one we care about.
>
> Add a test for the cleanup code (and a bit of a load-test for dtprobed)
> that just kicks off a bunch of processes with probes. This won't fail
> itself but if things go wrong it might provoke a dtprobed crash and a
> failure of later usdt tests, which is not a valueless thing (since the
> cleanup code is otherwise not tested at all, and it found a bug when I
> wrote it, so let's keep it).
>
> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
> dtprobed/dtprobed.c | 94 +++++++++++++++++++++++++++--
> test/unittest/usdt/tst.manyprocs.sh | 70 +++++++++++++++++++++
> 2 files changed, 158 insertions(+), 6 deletions(-)
> create mode 100755 test/unittest/usdt/tst.manyprocs.sh
>
> diff --git a/dtprobed/dtprobed.c b/dtprobed/dtprobed.c
> index 29cddba1d9658..6f77553b64246 100644
> --- a/dtprobed/dtprobed.c
> +++ b/dtprobed/dtprobed.c
> @@ -51,6 +51,7 @@
> #include <systemd/sd-daemon.h>
> #endif
>
> +#include <dt_list.h>
> #include "dof_parser.h"
> #include "uprobes.h"
>
> @@ -59,6 +60,8 @@
> #define DOF_MAXSZ 512 * 1024 * 1024
> #define DOF_CHUNKSZ 64 * 1024
>
> +#define CLEANUP_INTERVAL 128
> +
> static struct fuse_session *cuse_session;
>
> int _dtrace_debug = 0;
> @@ -69,6 +72,7 @@ static pid_t parser_pid;
> static int parser_in_pipe[2];
> static int parser_out_pipe[2];
> static int timeout = 5000; /* In seconds. */
> +static int rq_count = 0;
>
> static void helper_ioctl(fuse_req_t req, int cmd, void *arg,
> struct fuse_file_info *fi, unsigned int flags,
> @@ -147,6 +151,8 @@ typedef enum dtprobed_fuse_state {
> * State crossing calls to CUSE request functions.
> */
> typedef struct dtprobed_userdata {
> + dt_list_t list;
> + pid_t pid;
> dtprobed_fuse_state_t state;
> dof_helper_t dh;
> dof_hdr_t dof_hdr;
> @@ -154,8 +160,10 @@ typedef struct dtprobed_userdata {
> char *buf;
> } dtprobed_userdata_t;
>
> +static dt_list_t all_userdata;
> +
> struct fuse_session *
> -setup_helper_device(int argc, char **argv, char *devname, dtprobed_userdata_t *userdata)
> +setup_helper_device(int argc, char **argv, char *devname)
> {
> struct cuse_info ci;
> struct fuse_session *cs;
> @@ -175,7 +183,7 @@ setup_helper_device(int argc, char **argv, char *devname, dtprobed_userdata_t *u
> ci.dev_info_argv = dev_info_argv;
>
> cs = cuse_lowlevel_setup(argc, argv, &ci, &dtprobed_clop,
> - &multithreaded, userdata);
> + &multithreaded, NULL);
>
> if (cs == NULL) {
> perror("allocating helper device");
> @@ -204,6 +212,73 @@ teardown_device(void)
> cuse_lowlevel_teardown(cuse_session);
> }
>
> +/*
> + * Allocate a userdata for the given PID, or return an existing one
> + * if possible. Userdatas are insertion-sorted into descending order
> + * to make old stale ones from processes killed at the wrong time a
> + * bit less likely to impose overhead.
> + */
> +static dtprobed_userdata_t *
> +get_userdata(pid_t pid)
> +{
> + dtprobed_userdata_t *userdatap;
> + dtprobed_userdata_t *new;
> + int prepend = 0;
> +
> + for (userdatap = dt_list_next(&all_userdata); userdatap != NULL;
> + userdatap = dt_list_next(userdatap)) {
> + if (userdatap->pid == pid)
> + return userdatap;
> +
> + if (userdatap->pid < pid)
> + break;
> + }
> +
> + if ((new = calloc(1, sizeof(struct dtprobed_userdata))) == NULL)
> + return NULL;
> +
> + /*
> + * Go back one, to the last list entry with a pid higher than this one,
> + * since dt_list_insert inserts *after* the entry passed in.
> + */
> + if (userdatap) {
> + userdatap = dt_list_prev(userdatap);
> + if (userdatap == NULL)
> + prepend = 1;
> + }
> +
> + new->pid = pid;
> +
> + if (!prepend)
> + dt_list_insert(&all_userdata, userdatap, new);
> + else
> + dt_list_prepend(&all_userdata, new);
> +
> + return new;
> +}
> +
> +/*
> + * Clean up userdatas no longer involved in ioctls or no longer corresponding to
> + * live pids.
> + */
> +static void
> +cleanup_userdata(void)
> +{
> + dtprobed_userdata_t *userdatap;
> + dtprobed_userdata_t *last_userdatap = NULL;
> +
> + for (userdatap = dt_list_next(&all_userdata); userdatap != NULL;
> + userdatap = dt_list_next(userdatap)) {
> + if (userdatap->state == DTP_IOCTL_START || kill(userdatap->pid, 0) < 0) {
> + dt_list_delete(&all_userdata, userdatap);
> + free(userdatap->buf);
> + free(last_userdatap);
> + last_userdatap = userdatap;
> + }
> + }
> + free(last_userdatap);
> +}
> +
> /*
> * Parse a piece of DOF. Return 0 iff the pipe has closed and no more parsing
> * is possible.
> @@ -360,9 +435,9 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> struct fuse_file_info *fi, unsigned int flags,
> const void *in_buf, size_t in_bufsz, size_t out_bufsz)
> {
> - dtprobed_userdata_t *userdata = fuse_req_userdata(req);
> struct iovec in;
> pid_t pid = fuse_req_ctx(req)->pid;
> + dtprobed_userdata_t *userdata = get_userdata(pid);
> const char *errmsg;
> const void *buf;
>
> @@ -554,9 +629,17 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
>
> free(userdata->buf);
> userdata->buf = NULL;
> -
> userdata->state = DTP_IOCTL_START;
>
> + /*
> + * Periodically clean up old userdata (applying to pids with no live
> + * transaction, or pids which no longer exist).
> + */
> + if (rq_count++ > CLEANUP_INTERVAL) {
> + cleanup_userdata();
> + rq_count = 0;
> + }
> +
> return;
>
> fuse_errmsg:
> @@ -750,7 +833,6 @@ main(int argc, char *argv[])
> int sync_fd = -1;
> int ret;
> struct sigaction sa = {0};
> - dtprobed_userdata_t userdata = {0};
>
> /*
> * These are "command-line" arguments to FUSE itself: our args are
> @@ -797,7 +879,7 @@ main(int argc, char *argv[])
> close_range(3, ~0U, 0);
>
> if ((cuse_session = setup_helper_device(fuse_argc, fuse_argv,
> - devname, &userdata)) == NULL)
> + devname)) == NULL)
> exit(1);
>
> if (!foreground) {
> diff --git a/test/unittest/usdt/tst.manyprocs.sh b/test/unittest/usdt/tst.manyprocs.sh
> new file mode 100755
> index 0000000000000..6ebd873672b4b
> --- /dev/null
> +++ b/test/unittest/usdt/tst.manyprocs.sh
> @@ -0,0 +1,70 @@
> +#!/bin/bash
> +#
> +# Oracle Linux DTrace.
> +# Copyright (c) 2006, 2023, Oracle and/or its affiliates. All rights reserved.
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +#
> +# Verify that dtprobed can handle lots of processes. We don't check that
> +# it's cleaning them up, and we don't explicitly check that it's not dead,
> +# but if the dead-process-cleanup process fails and kills dtprobed we will
> +# find out when later tests fail due to lack of dtprobed.
> +#
> +if [ $# != 1 ]; then
> + echo expected one argument: '<'dtrace-path'>'
> + exit 2
> +fi
> +
> +dtrace=$1
> +CC=/usr/bin/gcc
> +CFLAGS=
> +
> +DIRNAME="$tmpdir/usdt-manyprocs.$$.$RANDOM"
> +mkdir -p $DIRNAME
> +cd $DIRNAME
> +
> +cat > prov.d <<EOF
> +provider test_prov {
> + probe go();
> +};
> +EOF
> +
> +$dtrace -h -s prov.d
> +if [ $? -ne 0 ]; then
> + echo "failed to generate header file" >& 2
> + exit 1
> +fi
> +
> +cat > test.c <<EOF
> +#include "prov.h"
> +
> +int
> +main(int argc, char **argv)
> +{
> + TEST_PROV_GO();
> +
> + return 0;
> +}
> +EOF
> +
> +${CC} ${CFLAGS} -c test.c
> +if [ $? -ne 0 ]; then
> + echo "failed to compile test.c" >& 2
> + exit 1
> +fi
> +$dtrace -G -s prov.d test.o
> +if [ $? -ne 0 ]; then
> + echo "failed to create DOF" >& 2
> + exit 1
> +fi
> +${CC} ${CFLAGS} -o test test.o prov.o
> +if [ $? -ne 0 ]; then
> + echo "failed to link final executable" >& 2
> + exit 1
> +fi
> +
> +for ((i=0; i < 1024; i++)); do
> + ./test
> +done
> +
> +exit 0
>
> base-commit: f543fa6706c0b31364356c01bf3de63e3cce8ad1
> prerequisite-patch-id: fa667248b7b02e92ee6b3807123ebcf2ac0acc38
> prerequisite-patch-id: 0cec5f1a31d528a493d6143644c1e6b78c7f403b
> prerequisite-patch-id: 1e1ced5fcec01a0c2e38d12f1ae5a124c8f93860
> prerequisite-patch-id: eb290f098e38981181d85240150e1713979dc505
> prerequisite-patch-id: c25b38ab68e815a63f064aff7496408684f94a36
> prerequisite-patch-id: b86eb7e00b5883a586bd4dc7143ab25094912288
> prerequisite-patch-id: 85d3c176dc4a7105e4f04dbcdb4ccbdf18741795
> prerequisite-patch-id: 94206b53a67820bb1c9e828c11e57a0cb36949b6
> prerequisite-patch-id: c5a64d246ff26047723e5e1a9c16b42a1d5fbf87
> prerequisite-patch-id: 89cf4c6b9242525883ac02bb8ef90856ef908d0f
> prerequisite-patch-id: 0992642ac657a1da617588a6d7b28f3669960374
> prerequisite-patch-id: e8a65c309b3eaf3341ceac421ef602bcc3f057fa
> prerequisite-patch-id: 4d38462a4bf2b242f8fcec3bf28f1ad97fa40ca7
> prerequisite-patch-id: 6430030eaa9b5f09eedf6d41b698549096905c56
> prerequisite-patch-id: aefef52e1438f34dd49e0cde010e791d96ef06d9
> prerequisite-patch-id: f9ca834410ab770ec2d0b4f4f1ea70d5bea37c84
> prerequisite-patch-id: a2e41bc39d3eb5bea3c1543928d1a695320cce52
> prerequisite-patch-id: 5998403c58e4b48f0cb0d87971e553d2999421fc
> prerequisite-patch-id: 8766128335d133bdd548ed5978aa9eefeb6f7b6d
> prerequisite-patch-id: 35e83414bbe54be31fe7cc15c7d6368543cd4e8f
> prerequisite-patch-id: fe12afceb640e98f3e1c71915e005f114677b22c
> prerequisite-patch-id: 3793aeb5329f974f12647158fe55329f5a801d06
> --
> 2.39.1.268.g9de2f9a303
>
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list