[DTrace-devel] [PATCH 1/2] Introduce dctx->mem as a pointer to the generic scratch memory

Kris Van Hees kris.van.hees at oracle.com
Mon Aug 30 12:12:04 PDT 2021


On Mon, Aug 30, 2021 at 02:05:47PM -0400, Eugene Loh wrote:
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> with comments below.
> 
> On 8/27/21 8:53 PM, Kris Van Hees wrote:
> > Since there are multiple language features that need scratch memory,
> > it is more convenient for code generation to be able to refer to the
> > scratch memory by means of a base pointer.  The generic scratch
> > memory is allocated as part of the 'mem' BPF map.
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > ---
> >   libdtrace/dt_bpf.c                       | 43 ++++++++++++++++--------
> >   libdtrace/dt_cg.c                        | 34 ++++++++++++-------
> >   libdtrace/dt_dctx.h                      |  2 ++
> >   test/unittest/codegen/tst.stack_layout.r | 22 ++++++------
> >   4 files changed, 63 insertions(+), 38 deletions(-)
> >
> > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > index 7ef868b1..787d204d 100644
> > --- a/libdtrace/dt_bpf.c
> > +++ b/libdtrace/dt_bpf.c
> > @@ -178,14 +178,17 @@ populate_probes_map(dtrace_hdl_t *dtp, int fd)
> >    *		with a singleton element (key 0).  This means that every CPU
> >    *		will see its own copy of this singleton element, and can use it
> >    *		without interference from other CPUs.  The scratch memory is
> > - *		used to store the DTrace context and the temporary output
> > - *		buffer.  The size of the map value (a byte array) is the size
> > - *		of the DTrace context, rounded up to a multiple of 8 bytes,
> > - *		plus 8 bytes padding, plus the maximum trace buffer record size
> > - *		that any of the compiled programs can emit, rounded up to a
> > - *		multiple of 8 bytes.
> > - *		The 8 bytes padding is used to ensure proper trace data
> > - *		alignment.
> > + *		used to store the DTrace context, the temporary output buffer,
> > + *		and temporary storage for stack traces, string manipulation,
> > + *		etc.
> > + *		The size of the map value (a byte array) is the sum of:
> > + *			- size of the DTrace context, rounded up to the nearest
> > + *			  multiple of 8
> You say "DTrace context" but the code has mstate.  This reference to 
> "DTrace context" appears multiple times.

Yes, that is because I adjusted the existing comment which was already wrong
of course.  Fixing.

> Speaking of "multiple times," there is a lot of repetition in these 
> comments.  There is an overview ("The scratch memory is...").  Then the 
> sizes are listed.  Then the list is repeated again when one gets to the 
> code.  How about removing the list here ("The size of the map 
> value...").  The copy of this list that appears later suffices.

The textual listing of the components that comprise the total value size should
have been deleted in favour of the bulleted list.  Fixing.

> > + *			- 8 bytes padding for trace buffer alignment purposes
> Actually, this padding confuses me.  Aren't the other partitions already 
> padded up to multiples of 8?

This padding is to ensure alignment when we copy the tracea data into the
perf ring buffer,  The details are in a rather old patch that introduced this.
The summary is that we build in a 8-byte gap so that the data being written can
be properly 64-bit aligned, and once it gets copied into the perf buffer along
with a 4-byte gap (half of this 8-byte padding) the data will still be 64-bit
aligned for the consumer because this data gets copied into the ring buffer
right after a header structure that is 8-byte aligned but is 4 bytes short of
being a complete multiple of 8..

> > + *			- maximum trace buffer record size, rounded up to the
> > + *			  multiple of 8
> > + *			- the greater of the maximum stack trace size and three
> > + *			  times the maximum string size
> Stylistically, since the other bullets don't start with "the" then this 
> shouldn't either.  Clearly not a big deal.

Well, then the others should start with 'the' instead because 'greater of ...'
doesn't sound right at all.

> >    * - strtab:	String table map.  This is a global map with a singleton
> >    *		element (key 0) that contains the entire string table as a
> >    *		concatenation of all unique strings (each terminated with a
> > @@ -223,7 +226,7 @@ populate_probes_map(dtrace_hdl_t *dtp, int fd)
> >   int
> >   dt_bpf_gmap_create(dtrace_hdl_t *dtp)
> >   {
> > -	int		stabsz, gvarsz, lvarsz, tvarc, aggsz;
> > +	int		stabsz, gvarsz, lvarsz, tvarc, aggsz, memsz;
> >   	int		ci_mapfd, st_mapfd, pr_mapfd;
> >   	uint32_t	key = 0;
> >   
> > @@ -271,12 +274,24 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
> >   	if (ci_mapfd == -1)
> >   		return -1;		/* dt_errno is set for us */
> >   
> > +	/*
> > +	 * The size of the map value (a byte array) is the sum of:
> > +	 *	- size of the DTrace context, rounded up to the nearest
> > +	 *	  multiple of 8
> > +	 *	- 8 bytes padding for trace buffer alignment purposes
> > +	 *	- maximum trace buffer record size, rounded up to the
> > +	 *	  multiple of 8
> > +	 *	- the greater of:
> > +	 *		+ the maximum stack trace size
> > +	 *		+ three times the maximum string size
> > +	 */
> Same dumb negligible comment about starting bullets consistently with or 
> else without "the."

Same response :)

> > +	memsz = roundup(sizeof(dt_mstate_t), 8) +
> > +		8 +
> > +		roundup(dtp->dt_maxreclen, 8) +
> > +		MAX(sizeof(uint64_t) * dtp->dt_options[DTRACEOPT_MAXFRAMES],
> > +		    3 * dtp->dt_options[DTRACEOPT_STRSIZE]);
> >   	if (create_gmap(dtp, "mem", BPF_MAP_TYPE_PERCPU_ARRAY,
> > -			sizeof(uint32_t),
> > -			roundup(sizeof(dt_mstate_t), 8) + 8
> > -			  + roundup(dtp->dt_maxreclen, 8)
> > -			  + 8 * dtp->dt_options[DTRACEOPT_MAXFRAMES],
> > -			1) == -1)
> > +			sizeof(uint32_t), memsz, 1) == -1)
> >   		return -1;		/* dt_errno is set for us */
> >   
> >   	/*
> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > index ac0b543d..b94f4485 100644
> > --- a/libdtrace/dt_cg.c
> > +++ b/libdtrace/dt_cg.c
> > @@ -65,7 +65,7 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act)
> >   	 *	dt_dctx_t	dctx;
> >   	 *	uint32_t	key;
> >   	 *	uintptr_t	rc;
> > -	 *	char		*buf;
> > +	 *	char		*buf, *mem;
> This declaration of "mem" is confusing.  It appears just after a use of 
> "mem" in the code, but that mem refers to the BPF map.  So, not really 
> the same thing.  Then the pseudocode refers to "mem" but it also refers 
> to the map.
> 
> More generally, names like "buf" and "mem" are really generic and 
> vague.  Actually, how is one even to know what the difference between 
> buf and mem are in the first place.  "Buffer" and "memory" are 
> candidates for some of the most common, generic names in programming!
> 
> It seems to me that "buf" is referring to the output buffer and so maybe 
> "out" would be a little more specific, clear, and useful.
> 
> And related to the patch at hand, "mem" is very vague and is a term 
> already being used to describe the BPF map.  Since it's for scratch 
> memory, perhaps "scr" would be more meaningful.

The way you should look at this is actually that the 'mem' BPF map represents
plain memory that we make use of in whatever way we want.  It is all scratch
memory in a sense, but some parts of it are being used for specific purposes
(but it *is* all volatile data - only existing for the life of the clause).

So, using 'mem' as the name of the BPF map makes sense, and using that also as
the name of the scratch memory makes sense because it is the remainder of the
map value space, i.e. the scratch memory that is not already being designated
for a specific purpose.  In that sense, mst and buf are specialized pointer
into the mem space , and mem is the generic one.

> >   	 *
> >   	 *				//     (%r8 = pointer to BPF context)
> >   	 *				// mov %r8, %r1
> > @@ -146,22 +146,30 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act)
> >   	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8));
> >   	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_BUF), BPF_REG_0));
> >   
> > +	/*
> > +	 *	mem = buf + roundup(dtp->dt_maxreclen, 8);
> > +	 *				// add %r0, roundup(dtp->dt_maxreclen,
> > +	 *	dctx.mem = mem;		// stdw [%fp + DCTX_FP(DCTX_BUF)], %r0
> > +	 */
> > +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, roundup(dtp->dt_maxreclen, 8)));
> > +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_MEM), BPF_REG_0));
> > +
> >   	/*
> >   	 * Store pointer to BPF map "name" in the DTrace context field "fld" at
> >   	 * "offset".
> >   	 *
> > -	 * key = 0;		// stw [%fp + DCTX_FP(offset)], 0
> > -	 * rc = bpf_map_lookup_elem(&name, &key);
> > -	 *			// lddw %r1, &name
> > -	 *			// mov %r2, %fp
> > -	 *			// add %r2, DCTX_FP(offset)
> > -	 *			// call bpf_map_lookup_elem
> > -	 *			//     (%r1 ... %r5 clobbered)
> > -	 *			//     (%r0 = name BPF map value)
> > -	 * if (rc == 0)		// jeq %r0, 0, lbl_exit
> > -	 *	goto exit;
> > -	 *			//     (%r0 = pointer to map value)
> > -	 * dctx.fld = rc;	// stdw [%fp + DCTX_FP(offset)], %r0
> > +	 *	key = 0;		// stw [%fp + DCTX_FP(offset)], 0
> > +	 *	rc = bpf_map_lookup_elem(&name, &key);
> > +	 *				// lddw %r1, &name
> > +	 *				// mov %r2, %fp
> > +	 *				// add %r2, DCTX_FP(offset)
> > +	 *				// call bpf_map_lookup_elem
> > +	 *				//     (%r1 ... %r5 clobbered)
> > +	 *				//     (%r0 = name BPF map value)
> > +	 *	if (rc == 0)		// jeq %r0, 0, lbl_exit
> > +	 *		goto exit;
> > +	 *				//     (%r0 = pointer to map value)
> > +	 *	dctx.fld = rc;		// stdw [%fp + DCTX_FP(offset)], %r0
> >   	 */
> >   #define DT_CG_STORE_MAP_PTR(name, offset) \
> >   	do { \
> > diff --git a/libdtrace/dt_dctx.h b/libdtrace/dt_dctx.h
> > index c065dc70..d19ea0cc 100644
> > --- a/libdtrace/dt_dctx.h
> > +++ b/libdtrace/dt_dctx.h
> > @@ -37,6 +37,7 @@ typedef struct dt_dctx {
> >   	dt_activity_t	*act;		/* pointer to activity state */
> >   	dt_mstate_t	*mst;		/* DTrace machine state */
> >   	char		*buf;		/* Output buffer scratch memory */
> > +	char		*mem;		/* General scratch memory */
> Here we go:  two kinds of memory and one of them is called mem. Again, 
> how about "out" and "scr"?  And then dropping "scratch" from the "Output 
> buffer" comment.  And then of course DCTX_SCR rather than DCTX_MEM.

No, two parts of the same scratch memory... oneis specific, and one is generic.
THough really, it is three parts of scratch memory.

> >   	char		*strtab;	/* String constant table */
> >   	char		*agg;		/* Aggregation data */
> >   	char		*gvars;		/* Global variables */
> > @@ -47,6 +48,7 @@ typedef struct dt_dctx {
> >   #define DCTX_ACT	offsetof(dt_dctx_t, act)
> >   #define DCTX_MST	offsetof(dt_dctx_t, mst)
> >   #define DCTX_BUF	offsetof(dt_dctx_t, buf)
> > +#define DCTX_MEM	offsetof(dt_dctx_t, mem)
> >   #define DCTX_STRTAB	offsetof(dt_dctx_t, strtab)
> >   #define DCTX_AGG	offsetof(dt_dctx_t, agg)
> >   #define DCTX_GVARS	offsetof(dt_dctx_t, gvars)
> > diff --git a/test/unittest/codegen/tst.stack_layout.r b/test/unittest/codegen/tst.stack_layout.r
> > index cc23a6a1..7c02de0d 100644
> > --- a/test/unittest/codegen/tst.stack_layout.r
> > +++ b/test/unittest/codegen/tst.stack_layout.r
> > @@ -1,12 +1,12 @@
> >   Base:          0
> > -dctx:        -64
> > -%r0:         -72
> > -%r1:         -80
> > -%r2:         -88
> > -%r3:         -96
> > -%r4:        -104
> > -%r5:        -112
> > -%r6:        -120
> > -%r7:        -128
> > -%r8:        -136
> > -scratch:    -137 .. -512
> > +dctx:        -72
> > +%r0:         -80
> > +%r1:         -88
> > +%r2:         -96
> > +%r3:        -104
> > +%r4:        -112
> > +%r5:        -120
> > +%r6:        -128
> > +%r7:        -136
> > +%r8:        -144
> > +scratch:    -145 .. -512
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list