[DTrace-devel] [PATCH 1/2] Add support for the basename() action

Eugene Loh eugene.loh at oracle.com
Fri Nov 19 19:53:22 UTC 2021


On 11/18/21 11:51 PM, Kris Van Hees wrote:

> On Thu, Nov 18, 2021 at 08:45:04PM -0500, Eugene Loh via DTrace-devel wrote:
>> On 11/18/21 7:04 PM, Kris Van Hees wrote:
>>
>>> On Tue, Nov 16, 2021 at 05:47:59PM -0500, eugene.loh--- via DTrace-devel wrote:
>>>> From: Eugene Loh <eugene.loh at oracle.com>
>>>>
>>>> diff --git a/bpf/basename.S b/bpf/basename.S
>>>> @@ -0,0 +1,139 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
>>>> + */
>>>> +
>>>> +#define DT_STRLEN_BYTES		2
>>>> +
>>>> +#define BPF_FUNC_probe_read	4
>>>> +#define BPF_FUNC_probe_read_str	45
>>>> +
>>>> +/*
>>>> + * void dt_basename(char *src, char *dst);
>>>> + */
>>>> +	.text
>>>> +	.align	4
>>>> +	.global	dt_basename
>>>> +	.type	dt_basename, @function
>>>> +dt_basename :
>>>> +#define SRC %r6
>>>> +#define DST %r7
>>>> +#define BGN %r8
>>>> +#define END %r9
>>>> +
>>>> +	/* store copies of input arguments */
>>>> +	mov	SRC, %r1
>>>> +	mov	DST, %r2
>>>> +
>>>> +	/* ignore the string-length prefix */
>>>> +	add	SRC, DT_STRLEN_BYTES
>>>> +
>>>> +	/* r0 = bpf_probe_read_str(dst, STRSZ + 1, src) */
>>>> +	mov	%r1, DST
>>>> +	lddw	%r2, STRSZ
>>>> +	add	%r2, 1
>>>> +	mov	%r3, SRC
>>>> +	call	BPF_FUNC_probe_read_str
>>>> +
>>>> +	/* if (r0 s<= 1) goto Ldot */
>>>> +	jsle	%r0, 1, .Ldot
>>>> +
>>>> +	/* end = r0 - 1 */
>>>> +	mov	END, %r0
>>>> +	sub	END, 1
>>>> +
>>>> +	/*
>>>> +	 * Loop over end, backing it up to find a non-'/' char.
>>>> +	 */
>>>> +.Lend:
>>>> +	/* end-- */
>>>> +	sub	END, 1
>>>> +	/* if (end s< 0) goto Lslash */
>>>> +	jslt	END, 0, .Lslash
>>>> +	/* if (src[end] == '/') goto Lend */
>>>> +	mov	%r1, SRC
>>>> +	add	%r1, END
>>>> +	ldxb	%r1, [%r1+0]
>>>> +	and	%r1, 0xff
>>>> +	jeq	%r1, '/', .Lend
>>>> +
>>>> +	/*
>>>> +	 * Loop over bgn, backing it up to find a '/' char.
>>>> +	 */
>>>> +	/* bgn = end */
>>>> +	mov	BGN, END
>>>> +.Lbgn:
>>>> +	/* bgn-- */
>>>> +	sub	BGN, 1
>>>> +	/* if (bgn s< 0) goto Lcopy */
>>>> +	jslt	BGN, 0, .Lcopy
>>>> +	/* if (src[bgn] != '/') goto Lbgn */
>>>> +	mov	%r1, SRC
>>>> +	add	%r1, BGN
>>>> +	ldxb	%r1, [%r1+0]
>>>> +	and	%r1, 0xff
>>>> +	jne	%r1, '/', .Lbgn
>>>> +
>>>> +.Lcopy:
>>>> +	/*
>>>> +	 * The output string is a copy of the designated substring.
>>>> +	 */
>>>> +	/* end -= bgn (and help the BPF verifier) */
>>>> +	sub	END, BGN
>>> Err, this kind of breaks your whole scheme of using defines for register
>>> assignments because you just changed the meaning of END here.  It was the
>>> index of the end and now it becomes a byte count.
>> I see what you're saying, but I don't think this any more severe than saying
>> "len means STRSZ+1; oh but now it means strlen."  The pseudocode is short
>> enough (a few lines) that I think it's not hard to follow.  Would it seem
>> better to you to call it "len"?  So, "len" originally means from the
>> beginning of the old string but then it means from the beginning of the new
>> string?
> Well, no, because now you replace the issue with END meaning two things simply
> with having a LEN that means two things.  And if that is OK then I think that
> the entire purpose of these defines is lost.

I'm really having a hard time understanding these objections.  It is 
routine that a variable's exact meaning morphs over the course of its 
use.  I was going to cite specific cases, but this practice is so 
routine that that would be ridiculous.  Again, all programmers do this 
routinely.

We want to know the length of the output string.  We call it LEN. We 
read the input string and initialize LEN.  Is its value really the value 
we want?  No, but I think most readers will understand that.  We 
decrement to strip off the NUL byte.  Does that require a name change?  
I would think readers generally will not think so.  We strip away 
trailing / chars, decrementing LEN.  Should the name change?  I would 
think not.  We figure out how much to strip off the front and subtract 
that from LEN.  Should the name change now?  I would not think so.  
Throughout all these changes -- starting with the length of the 
NUL-terminated input string and ending with the stripped down length we 
sought -- I think a name like "LEN" makes perfect sense.

> Of course, I find the use of
> defines here confusing as is anyway, because you invariably need to know what
> the macro is defined as to know what instructions to use, so the value seems to
> be a bit lost on me.  E.g. SRC is a register here but in other functions you
> have defined it as a [%fp + OFFSET] style stack location.  But the specifics
> of those definitions affects whether you can do a MOV on it or need to use LD
> to get the value or ST to set the value.  But anyway, that is just that.

SRC DST BGN and LEN are all registers.

If there are problems in other patches, we can discuss them separately.  
Presumably, however, whenever a macro refers to a stack offset rather 
than a register, its name distinguishes it to avoid the very confusion 
you mention.

> Point here is though that if you are going to use symbolic names, they should
> mean somrthing and be consistent.  I think you'd be better served then to also
> set a define CNT %r9 and when you are done with END and need to start working
> with a byte count, start using CNT.  And since you already get to deal in other
> code with needing to know what something is defined as, you simply have the
> same here where you need to know that END and CNT use the same register.

There is other code where a "variable" is no longer needed and the macro 
is #undef'ed.  Then, when that register is used for something else, a 
new #define is introduced.  That practice is analogous to what we do in 
dt_cg.c with regset_alloc() and regset_free().

Here, though, we want a length that starts as the entirety of the input 
string (plus NUL char) and we whittle it down to get what we want.  
Changing its name along the way strikes me as arbitrary and confusing.

Meanwhile, I'll post a v3 of these patches with the str_of_str() to 
path_helper() name change.



More information about the DTrace-devel mailing list