[DTrace-devel] [PATCH 3/6] dtprobed: support DOF > 64KiB

Thu May 25 20:55:10 UTC 2023

On 23 May 2023, Eugene Loh via DTrace-devel spake thusly:

> I suspect this patch is good (thank goodness for tests), but since I don't understand it let me just start by asking dumb questions.

There are no dumb questions, only insufficiently clear code.

> On 5/22/23 16:20, Nick Alcock via DTrace-devel wrote:
>> dtprobed's ioctl handler is a state machine that reads in client DOF in
>> multiple phases:
>>
>>   DTP_IOCTL_START: read the struct dof_helper_t
>>
>>   DTP_IOCTL_HDR: read the struct dof_hdr_t, giving us the size of the DOF
>>
>>   DTP_IOCTL_DOFHDR: read the DOF itself
>>
>>   DTP_IOCTL_DOF: (everything is read; actually parse the DOF)
>>
>> These various phases are triggered by issuing a fuse_reply_ioctl_retry()
>> to ask FUSE to go away and read things out of the ioctl() caller for us.
>
> s/to go away and read things/to read/

I was trying to imply the "we actually do this by doing a return; and
letting it call us back". Rephrased a bit as "to ask FUSE to read things
out of the ioctl() caller for us and call us back with the new stuff".

>> -		if (in_bufsz > sizeof(dof_hdr_t)) {
>> -			errmsg = "DOF header size incorrect";
>> -			fuse_log(FUSE_LOG_ERR, "%i: dtprobed: %s: %zi, not %zi\n",
>> -				 pid, errmsg, in_bufsz, sizeof(dof_hdr_t));
>> -			goto fuse_err;
>> +		if (userdata->state == DTP_IOCTL_DOFHDR &&
>
> Why introduce the userdata->state==DTP_IOCTL_DOFHDR check when this check was just performed?  Why do it twice?

Before this patch, we are repeatedly called back by FUSE like so:

call 0: DTP_IOCTL_START: just a pointer, directly from drti.c; ask FUSE
                         to deref it
call 1: DTP_IOCTL_HDR: dereffed to get a dof_helper_t; this contains a
                       pointer to the DOF itself; ask FUSE to deref it;
                       we cannot yet tell how big it is, but it must be
                       at least as big as a dof_hdr_t, so get that much
call 2: DTP_IOCTL_DOFHDR: dereffed to get a dof_hdr_t; now we know how
                          big the DOF is; ask FUSE to get that much
                          from the same address
call 3: DTP_IOCTL_DOF: we have all the DOF! parse it

After this patch, we add a state, and call 2 above becomes:

call 2: DTP_IOCTL_DOFHDR: dereffed to get a dof_hdr_t; now we know how
                          big the DOF is; ask FUSE to get that much
                          from the same address; if it's bigger than
                          64K, ask FUSE to get the first 64K
call 2+: DTP_IOCTL_DOFCHUNK: called back with 64K more of a >64K DOF;
                             concatenate it and possibly get the next
                             64K. Once all are done, go on to
                             DTP_IOCTL_DOF as above.

I would be entirely happy to add more comments explaining this better.
Clearly my current ones aren't good enough :(

>> +		    (in_bufsz > sizeof(dof_hdr_t))) {
>> +			    errmsg = "DOF header size incorrect";
>> +			    fuse_log(FUSE_LOG_ERR, "%i: dtprobed: %s: %zi, not %zi\n",
>> +				     pid, errmsg, in_bufsz, sizeof(dof_hdr_t));
>> +			    goto fuse_err;
>>   		}
>
> What's with the four-space indentation?

Emacs playing silly buggers. Fixed.

>> +				remaining = userdata->dof_hdr.dofh_loadsz -
>> +					    userdata->next_chunk;
>> +
>> +				if (remaining < 0)
>> +					remaining = 0;
>
> I guess this is fine, but then we go on and do the memcpy(), next_chunk+=, and remaining-=, which feels weird when remaining was
> already 0.  Again, I guess this is fine, but it feels funny.

I'm just trying to cut down conditionals in really rare "this is
probably corrupted DOF and probably a bug" hopefully-can-never-happen
cases. It *works* with 0, after all (if it doesn't, we're in big
trouble).

>> +				/*
>> +				 * We've already read a chunk: preserve it and
>> +				 * go on to the next, until we are out.
>> +				 */
>> +
>> +				memcpy(userdata->buf + userdata->next_chunk, in_buf,
>> +				       MIN(in_bufsz, remaining));
>> +
>> +				userdata->next_chunk += in_bufsz;
>> +				remaining -= in_bufsz;
>
> This also feels funny.  Why use in_bufsz when the copy was
> MIN(in_bufsz,remaining)?  I mean, it works out okay, but just feels
> funny.  This readability stuff is clearly subjective, but I think it'd
> be easier for me to follow if there were a "if(remaining<=in_bufsz)"
> check right after the memcpy instead of the "if(remaining<=0)" check
> below.  But, again, no version is unambiguously better than the other.

Conceptually, I'm saying "copy no more than the amount of data we have
left, even if FUSE accidentally gave us more: but don't copy more than
we were given". (Yes, it can sometimes hand back extra!)

But you're quite right that if it *does* hand back too much we now leave
a hole. Whoops! Adjusted, with a new size_t to_copy for clarity:

		to_copy = MIN(in_bufsz, remaining)
		memcpy(userdata->buf + userdata->next_chunk, in_buf,
		       to_copy);

		userdata->next_chunk += to_copy;
		remaining -= to_copy;

(the rest unchanged).

>> +gen_test()
>> +{
>> +    local -i n=$1
>> +    echo "#include <sys/sdt.h>"
>> +    echo "void main(void) {"
>> +    for ((i=0; i<$n; i++)); do
>> +      echo 'DTRACE_PROBE1(manyprobes, 'test$i, '"foo");'
>> +    done
>> +    echo "}"
>> +}
>
> Stylistically, how about tabs rather than spaces?  Also (and I'm no
> bash expert) can't you use $nprobes here and skip all the $1 and $n
> stuff?  And don't people use `seq ...` instead of ((i=0; i<$n; i++))?

Yeah, this was literally copied directly from Steve's bug report :) seq
is actually a tiny bit less efficient: I was thinking 'maybe I should
use bash loops more'. But I'm happy to switch, since I hardly ever use
bash C-style loops either -- I just feel guilty about it.  (Adjusted.)

>> +script
>> +status=$?
>> +
>> +# Temporary, until autocleanup is implemented
>> +echo - > /sys/kernel/tracing/uprobe_events 2>/dev/null
>
> Doesn't that remove all uprobes?

... possibly breaking other USDT users on the same system as the test?
Yes, I suppose it does, but I'm not sure how to remove only the huge
pile of now-redundant uprobes this test leaves behind, and if you don't
the remaining tests become incredibly slow. (Modern systems, tens of
billions of clock cycles per second and we can't have as many probes as
we had bytes of RAM in C64 BASIC without the system hitting a wall.
Sheesh.)

Got some syntax that might help? (And possibly a way to fix this stupid
problem for good? dtprobed should be cleaning this crap up somehow, but
it's tricky because just because the original process is dead doesn't
mean some of the probes in it, particularly in shared libraries, don't
survive in accessible mappings in other processes.)

-- 
NULL && (void)