[DTrace-devel] [PATCH 3/6] dtprobed: support DOF > 64KiB
Eugene Loh
eugene.loh at oracle.com
Tue May 23 17:24:43 UTC 2023
I suspect this patch is good (thank goodness for tests), but since I
don't understand it let me just start by asking dumb questions.
On 5/22/23 16:20, Nick Alcock via DTrace-devel wrote:
> dtprobed's ioctl handler is a state machine that reads in client DOF in
> multiple phases:
>
> DTP_IOCTL_START: read the struct dof_helper_t
>
> DTP_IOCTL_HDR: read the struct dof_hdr_t, giving us the size of the DOF
>
> DTP_IOCTL_DOFHDR: read the DOF itself
>
> DTP_IOCTL_DOF: (everything is read; actually parse the DOF)
>
> These various phases are triggered by issuing a fuse_reply_ioctl_retry()
> to ask FUSE to go away and read things out of the ioctl() caller for us.
s/to go away and read things/to read/
> We can read from any address in the client (we are root), but that
> doesn't mean we can read any *length* -- the kernel refuses to read
> beyond a certain length and alas that length is customizable via mount
> options (or, for CUSE, via kernel command-line options) and we can't
> easily tell what it is.
>
> I think it reasonable to say that anyone who *reduces* the length below
> the default is asking for it: I can't find any CUSE or FUSE-using code
> anywhere on the Internet that does anything more than hardwire this
> value to 128KiB. For paranoia, we use 64KiB ourselves -- that's still a
> lot of probes in one go.
>
> We add a new state, DTP_IOCTL_DOFCHUNK, and read in one piece of DOF
> per chunk until all the chunks are read in, concatenating them before
> handing it off to the parser.
>
> A new test checks this stuff: it has 206KiB of DOF which is enough to
> test "first chunk", "neither first nor last chunk" and "last chunk"
> cases: the DOF itself isn't formatted such that this equates to "first
> probe, probe in the middle, last probe" but if any of those chunks is
> missing or misaligned the string tables will be fubared and all the
> probe names will be wrong, triggering a test failure (at best).
>
> Orabug: 35411920
> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> ---
> dtprobed/dtprobed.c | 105 ++++++++++++++++++++++++---
> test/unittest/usdt/tst.manyprobes.r | 4 +
> test/unittest/usdt/tst.manyprobes.sh | 77 ++++++++++++++++++++
> 3 files changed, 177 insertions(+), 9 deletions(-)
> create mode 100644 test/unittest/usdt/tst.manyprobes.r
> create mode 100755 test/unittest/usdt/tst.manyprobes.sh
>
> diff --git a/dtprobed/dtprobed.c b/dtprobed/dtprobed.c
> index 9eadf8cffc15d..01a30ab95129b 100644
> --- a/dtprobed/dtprobed.c
> +++ b/dtprobed/dtprobed.c
> @@ -5,6 +5,7 @@
> * http://oss.oracle.com/licenses/upl.
> */
>
> +#include <sys/param.h>
> #include <sys/uio.h>
> #include <sys/wait.h>
> #include <errno.h>
> @@ -56,6 +57,7 @@
> #include "seccomp-assistance.h"
>
> #define DOF_MAXSZ 512 * 1024 * 1024
> +#define DOF_CHUNKSZ 64 * 1024
>
> static struct fuse_session *cuse_session;
>
> @@ -137,7 +139,8 @@ typedef enum dtprobed_fuse_state {
> DTP_IOCTL_START = 0,
> DTP_IOCTL_HDR = 1,
> DTP_IOCTL_DOFHDR = 2,
> - DTP_IOCTL_DOF = 3
> + DTP_IOCTL_DOFCHUNK = 3,
> + DTP_IOCTL_DOF = 4
> } dtprobed_fuse_state_t;
>
> /*
> @@ -147,6 +150,8 @@ typedef struct dtprobed_userdata {
> dtprobed_fuse_state_t state;
> dof_helper_t dh;
> dof_hdr_t dof_hdr;
> + size_t next_chunk;
> + char *buf;
> } dtprobed_userdata_t;
>
> struct fuse_session *
> @@ -357,6 +362,7 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> struct iovec in;
> pid_t pid = fuse_req_ctx(req)->pid;
> const char *errmsg;
> + const void *buf;
>
> /*
> * We can just ignore FUSE_IOCTL_COMPAT: the 32-bit and 64-bit versions
> @@ -385,6 +391,13 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> errmsg = "error reading ioctl size";
> if (fuse_reply_ioctl_retry(req, &in, 1, NULL, 0) < 0)
> goto fuse_errmsg;
> +
> + /*
> + * userdata->buf should already be freed, but if somehow it is
> + * non-NULL make absolutely sure it is cleared out.
> + */
> + userdata->buf = NULL;
> + userdata->next_chunk = 0;
> userdata->state = DTP_IOCTL_HDR;
> return;
> }
> @@ -425,17 +438,18 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> }
>
> /*
> - * Third call: validate the DOF length and get the DOF itself.
> + * Third call: validate the DOF length and stash it away.
> */
> if (userdata->state == DTP_IOCTL_DOFHDR) {
> /*
> * Too much data is as bad as too little.
> */
> - if (in_bufsz > sizeof(dof_hdr_t)) {
> - errmsg = "DOF header size incorrect";
> - fuse_log(FUSE_LOG_ERR, "%i: dtprobed: %s: %zi, not %zi\n",
> - pid, errmsg, in_bufsz, sizeof(dof_hdr_t));
> - goto fuse_err;
> + if (userdata->state == DTP_IOCTL_DOFHDR &&
Why introduce the userdata->state==DTP_IOCTL_DOFHDR check when this
check was just performed? Why do it twice?
> + (in_bufsz > sizeof(dof_hdr_t))) {
> + errmsg = "DOF header size incorrect";
> + fuse_log(FUSE_LOG_ERR, "%i: dtprobed: %s: %zi, not %zi\n",
> + pid, errmsg, in_bufsz, sizeof(dof_hdr_t));
> + goto fuse_err;
> }
What's with the four-space indentation?
> memcpy(&userdata->dof_hdr, in_buf, sizeof(dof_hdr_t));
>
> @@ -443,16 +457,78 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> fuse_log(FUSE_LOG_WARNING, "%i: dtprobed: DOF size of %zi longer than is sane\n",
> pid, userdata->dof_hdr.dofh_loadsz);
>
> + /* Fall through. */
> + }
> +
> + /*
> + * Third and possibly further calls: get the DOF itself, in chunks if
> + * it's too big.
> + */
> + if (userdata->state == DTP_IOCTL_DOFHDR ||
> + userdata->state == DTP_IOCTL_DOFCHUNK) {
> + int by_chunks = 0;
> +
> in.iov_base = (void *) userdata->dh.dofhp_dof;
> in.iov_len = userdata->dof_hdr.dofh_loadsz;
>
> + /*
> + * If the data being read in is too large, read by chunks. We
> + * cannot determine the chunk size we can use: the first failure
> + * returns -ENOMEM to the caller of ioctl().
> + */
> + if (userdata->dof_hdr.dofh_loadsz > DOF_CHUNKSZ) {
> + by_chunks = 1;
> +
> + if (userdata->state == DTP_IOCTL_DOFHDR) {
> + userdata->buf = malloc(userdata->dof_hdr.dofh_loadsz);
> + in.iov_len = DOF_CHUNKSZ;
> +
> + if (userdata->buf == NULL) {
> + errmsg = "out of memory allocating DOF";
> + goto fuse_errmsg;
> + }
> + } else {
> + ssize_t remaining;
> +
> + remaining = userdata->dof_hdr.dofh_loadsz -
> + userdata->next_chunk;
> +
> + if (remaining < 0)
> + remaining = 0;
I guess this is fine, but then we go on and do the memcpy(),
next_chunk+=, and remaining-=, which feels weird when remaining was
already 0. Again, I guess this is fine, but it feels funny.
> + /*
> + * We've already read a chunk: preserve it and
> + * go on to the next, until we are out.
> + */
> +
> + memcpy(userdata->buf + userdata->next_chunk, in_buf,
> + MIN(in_bufsz, remaining));
> +
> + userdata->next_chunk += in_bufsz;
> + remaining -= in_bufsz;
This also feels funny. Why use in_bufsz when the copy was
MIN(in_bufsz,remaining)? I mean, it works out okay, but just feels
funny. This readability stuff is clearly subjective, but I think it'd
be easier for me to follow if there were a "if(remaining<=in_bufsz)"
check right after the memcpy instead of the "if(remaining<=0)" check
below. But, again, no version is unambiguously better than the other.
> + if (remaining <= 0) {
> + userdata->state = DTP_IOCTL_DOF;
> + goto chunks_done;
> + }
> +
> + in.iov_base = (char *) in.iov_base + userdata->next_chunk;
> + in.iov_len = MIN(DOF_CHUNKSZ, remaining);
> + }
> + }
> +
> errmsg = "cannot read DOF";
> if (fuse_reply_ioctl_retry(req, &in, 1, NULL, 0) < 0)
> goto fuse_errmsg;
> - userdata->state = DTP_IOCTL_DOF;
> +
> + if (by_chunks)
> + userdata->state = DTP_IOCTL_DOFCHUNK;
> + else
> + userdata->state = DTP_IOCTL_DOF;
> return;
> }
>
> + chunks_done:
> if (userdata->state != DTP_IOCTL_DOF) {
> errmsg = "FUSE internal state incorrect";
> goto fuse_errmsg;
> @@ -463,13 +539,20 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> * to unblock the ioctl() caller and return to start state.
> */
>
> + buf = in_buf;
> + if (userdata->buf)
> + buf = userdata->buf;
> +
> if (process_dof(req, parser_in_pipe[1], parser_out_pipe[0], pid,
> - &userdata->dh, in_buf) < 0)
> + &userdata->dh, buf) < 0)
> goto process_err;
>
> if (fuse_reply_ioctl(req, 0, NULL, 0) < 0)
> goto process_err;
>
> + free(userdata->buf);
> + userdata->buf = NULL;
> +
> userdata->state = DTP_IOCTL_START;
>
> return;
> @@ -482,6 +565,8 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> fuse_log(FUSE_LOG_ERR, "%i: dtprobed: %s\n", pid,
> "cannot send error to ioctl caller: %s",
> errmsg);
> + free(userdata->buf);
> + userdata->buf = NULL;
> userdata->state = DTP_IOCTL_START;
> return;
>
> @@ -489,6 +574,8 @@ helper_ioctl(fuse_req_t req, int cmd, void *arg,
> if (fuse_reply_err(req, EINVAL) < 0)
> fuse_log(FUSE_LOG_ERR, "%i: cannot unblock caller\n",
> pid);
> + free(userdata->buf);
> + userdata->buf = NULL;
> userdata->state = DTP_IOCTL_START;
> return;
> }
> diff --git a/test/unittest/usdt/tst.manyprobes.r b/test/unittest/usdt/tst.manyprobes.r
> new file mode 100644
> index 0000000000000..5c128019946ac
> --- /dev/null
> +++ b/test/unittest/usdt/tst.manyprobes.r
> @@ -0,0 +1,4 @@
> +test:main:test1
> +test:main:test750
> +test:main:test1999
> +
> diff --git a/test/unittest/usdt/tst.manyprobes.sh b/test/unittest/usdt/tst.manyprobes.sh
> new file mode 100755
> index 0000000000000..0f3fcb2b50fd9
> --- /dev/null
> +++ b/test/unittest/usdt/tst.manyprobes.sh
> @@ -0,0 +1,77 @@
> +#!/bin/bash
> +#
> +# Oracle Linux DTrace.
> +# Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +#
> +if [ $# != 1 ]; then
> + echo expected one argument: '<'dtrace-path'>'
> + exit 2
> +fi
> +
> +dtrace=$1
> +CC=/usr/bin/gcc
> +CFLAGS="-I${PWD}/uts/common"
> +DIRNAME="$tmpdir/usdt-manyprobes.$$.$RANDOM"
> +mkdir -p $DIRNAME
> +cd $DIRNAME
> +
> +gen_test()
> +{
> + local -i n=$1
> + echo "#include <sys/sdt.h>"
> + echo "void main(void) {"
> + for ((i=0; i<$n; i++)); do
> + echo 'DTRACE_PROBE1(manyprobes, 'test$i, '"foo");'
> + done
> + echo "}"
> +}
Stylistically, how about tabs rather than spaces? Also (and I'm no bash
expert) can't you use $nprobes here and skip all the $1 and $n stuff?
And don't people use `seq ...` instead of ((i=0; i<$n; i++))?
> +
> +gen_provider()
> +{
> + local -i n=$1
> + echo "provider manyprobes {"
> + for ((i=0; i<$n; i++)); do
> + echo "probe test$i(string);"
> + done
> + echo "};"
> +}
Same comments about tabs, $nprobes, and seq.
> +nprobes=2000
> +gen_test $nprobes > test.c
> +gen_provider $nprobes > manyprobes.d
> +
> +${CC} ${CFLAGS} -c test.c
> +if [ $? -ne 0 ]; then
> + echo "failed to compile test.c" >& 2
> + exit 1
> +fi
> +$dtrace -G -s manyprobes.d test.o
> +if [ $? -ne 0 ]; then
> + echo "failed to create DOF" >& 2
> + exit 1
> +fi
> +${CC} ${CFLAGS} -o test test.o manyprobes.o
> +if [ $? -ne 0 ]; then
> + echo "failed to link final executable" >& 2
> + exit 1
> +fi
> +
> +script()
> +{
> + $dtrace -c ./test -qs /dev/stdin <<EOF
> + manyprobes\$target:::test1, manyprobes\$target:::test750, manyprobes\$target:::test1999
> + {
> + printf("%s:%s:%s\n", probemod, probefunc, probename);
> + }
> +EOF
> +}
> +
> +script
> +status=$?
> +
> +# Temporary, until autocleanup is implemented
> +echo - > /sys/kernel/tracing/uprobe_events 2>/dev/null
Doesn't that remove all uprobes?
> +exit $status
More information about the DTrace-devel
mailing list