[DTrace-devel] [PATCH v2 4/5] Implement the support code for generating aggregation data

Mon Nov 30 22:54:13 PST 2020

On Mon, Nov 30, 2020 at 09:48:42PM -0800, Eugene Loh wrote:
> On 11/30/2020 08:11 PM, Kris Van Hees wrote:
> 
> > On Mon, Nov 30, 2020 at 06:42:29PM -0800, Eugene Loh wrote:
> >> On 11/24/2020 01:50 PM, Kris Van Hees wrote:
> >>
> >>> This patch adds support for aggregations to the D compiler and provides
> >>> an implementation for the count() and lquantize() aggregation functions
> >>> as examples on how to use the aggegation support.
> >>> The DT_CG_AGG_IMPL macro will:
> >>>       1. Prepare the first data copy for writing and redirect the consumer
> >>>          to read from the second copy.  A register 'dreg' will be reserved
> >>>          to hold a pointer to the data to be updated.
> >>>       2. Call f(dlp, drp, dreg, ...) to perform the data update.
> >>>       3. Prepare the second data copy for writing and redirect the consumer
> >>>          to read from the first copy.  A register 'dreg' will be reserved
> >>>          to hold a pointer to the data to be updated.
> >>>       4. Call f(dlp, drp, dreg, ...) to perform the data update.
> >> It would be clearer if one just said that steps 1 and 2 are repeated for
> >> the other buffer than to copy and paste them as steps 3 and 4. Also, the
> >> difference between "first" and "second" is immaterial. They're just two,
> >> equivalent buffers, with the producer writing one and the consumer
> >> reading the other.
> > Actually, there is a minor difference: the latch sequence number is only
> > ever updated in the first copy
> 
> You seem to make the same point in this response below, but this makes 
> no sense to me.  The latch sequence number has to be updated when each 
> copy is updated, and that sequence number is not particularly associated 
> with either copy;  it's associated with the CPU.  Each time, the per-CPU 
> latch sequence number is then used to determine which copy to use.  The 
> two copies are quite symmetric with respect to the seq#.

Look for my reply below.

> > (which is why it is referred to as primary
> > copy in the code comments.)  I don't think there is much gain from replacing
> > the 4 lines that comprise points 3 and 4 with a single line that sayd that
> > 1 and 2 are to be repeated for the other buffer because points 1 and 3 both
> > refer to two buffers.  I think being overly verbose here serves the purpose
> > of being abundantly clear about what the macro does.
> >
> >>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> >>> +/*
> >>> + * Prepare the aggregation buffer for updating for a specific aggregation, and
> >>> + * return a register that holds a pointer to the aggregation data to be
> >>> + * updated.
> >>> + *
> >>> + * We update the latch seqcount (first value in the aggregation buffer) to
> >>> + * signal that reads should be directed to the alternate copy of the data.  We
> >>> + * then determine the location of data for the given aggregation that can be
> >>> + * updated.  This value is stored in the register returned from this function.
> >>> + */
> >>> +static int
> >>> +dt_cg_agg_buf_prepare(dt_ident_t *aid, int size, dt_irlist_t *dlp,
> >>> +		      dt_regset_t *drp)
> >>> +{
> >>> +	int		rx, ry;
> >>> +	struct bpf_insn	instr;
> >>> +
> >>> +	TRACE_REGSET("            Prep: Begin");
> >>> +
> >>> +	dt_regset_xalloc(drp, BPF_REG_0);
> >>> +	rx = dt_regset_alloc(drp);
> >>> +	ry = dt_regset_alloc(drp);
> >>> +	assert(rx != -1 && ry != -1);
> >>> +
> >>> +	/*
> >>> +	 *				//     (%rX = register for agg ptr)
> >>> +	 *				//     (%rY = register for agg data)
> >>> +	 *	agd = dctx->agg;	// lddw %r0, [%fp + DT_STK_DCTX]
> >>> +	 *				// lddw %rX, [%r0 + DCTX_AGG]
> >>> +	 */
> >>> +	instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_STK_DCTX);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +	instr = BPF_LOAD(BPF_DW, rx, BPF_REG_0, DCTX_AGG);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +
> >>> +	/*
> >>> +	 *	off = (*agd % 2) * size	// lddw %rY, [%rX + 0]
> >>> +	 *	      + aid->di_offset	// and %rY, 1
> >>> +	 *	      + sizeof(uint64_t);
> >>> +	 *				// mul %rY, size
> >>> +	 *				// add %rY, aid->di_offset +
> >>> +	 *				//		sizeof(uint64_t)
> >>> +	 *	(*agd)++;		// mov %r0, 1
> >>> +	 *				// xadd [%rX + 0], %r0
> >>> +	 *	agd += off;		// add %rX, %rY
> >>> +	 */
> >>> +	instr = BPF_LOAD(BPF_DW, ry, rx, 0);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +	instr = BPF_ALU64_IMM(BPF_AND, ry, 1);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +	instr = BPF_ALU64_IMM(BPF_MUL, ry, size);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +	instr = BPF_ALU64_IMM(BPF_ADD, ry, aid->di_offset + sizeof(uint64_t));
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +	instr = BPF_MOV_IMM(BPF_REG_0, 1);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +	instr = BPF_XADD_REG(BPF_DW, rx, 0, BPF_REG_0);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +	instr = BPF_ALU64_REG(BPF_ADD, rx, ry);
> >>> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> >>> +
> >>> +	dt_regset_free(drp, ry);
> >>> +	dt_regset_free(drp, BPF_REG_0);
> >>> +
> >>> +	TRACE_REGSET("            Prep: End  ");
> >>> +
> >>> +	return rx;
> >>> +}
> >>> +
> >>> +#define DT_CG_AGG_IMPL(aid, sz, dlp, drp, f, ...) \
> >>> +	do {								\
> >>> +		int	dreg;						\
> >>> +									\
> >>> +		TRACE_REGSET("        Upd: Begin ");			\
> >>> +									\
> >>> +		/*							\
> >>> +		 * Redirect reading to secondary copy.  The register	\
> >>> +		 * returned holds the base pointer to the primary copy.	\
> >>> +		 */							\
> >> The nomenclature "primary" and "secondary" is introduced here and only
> >> here, it seems.  Why?  It suggests one of the buffers is more important
> >> or permanent or something and the other is a back-up, but there is
> >> really no difference between the two.
> > "first data copy" and "second data copy" vs "primary copy" and "secondary
> > copy"...  I am not sure it really matters or that it would seem confusing.
> > As I mention above, the first data copy (primary copy, first buffer, ...)
> > is slightly different because it stores the latch sequence number.
> 
> No.  Neither copy stores the seq#.  The seq# (in the current 
> implementation) is shared by all copies of all aggregations (on a 
> per-CPU basis).

Yes, sorry, I was mentally thinking about the earlier implementation I did
for this which used 2 map values.  Indeed, the aggregation data copies do not
contain the sequence number (although that is likely to change in the future
based on your suggestion anyway).  But either way, there is nothing wrong with
considering one to be primary and the other secondary.  Is it necessary?
No, it is not.  But it is not wrong either.  And conceptually, the second
copy exists so that we can direct the consumer to that copy while we update
the first one, and then we switch things around again so we can update the
second copy in preparation for the next update (i.e. so that the consumer will
continue to see the correct data no matter what).

> >>> +		dreg = dt_cg_agg_buf_prepare((aid), (sz), (dlp), (drp));\
> >>> +									\
> >>> +		/*							\
> >>> +		 * Call the update implementation.  Aggregation data is \
> >>> +		 * accessed through register 'reg'.			\
> >>> +		 */							\
> >>> +		(f)((dlp), (drp), dreg, ## __VA_ARGS__);		\
> >>> +		dt_regset_free((drp), dreg);				\
> >>> +									\
> >>> +		TRACE_REGSET("        Upd: Switch");			\
> >>> +									\
> >>> +		/*							\
> >>> +		 * Redirect reading to primary copy.  The register	\
> >>> +		 * returned holds the base pointer to the secondary	\
> >>> +		 * copy.						\
> >>> +		 */							\
> >>> +		dreg = dt_cg_agg_buf_prepare((aid), (sz), (dlp), (drp));\
> >>> +									\
> >>> +		/*							\
> >>> +		 * Call the update implementation.  Aggregation data is \
> >>> +		 * accessed through register 'reg'.			\
> >>> +		 */							\
> >>> +		(f)((dlp), (drp), dreg, ## __VA_ARGS__);		\
> >>> +		dt_regset_free((drp), dreg);				\
> >>> +									\
> >>> +		TRACE_REGSET("        Upd: End   ");			\
> >>> +	} while (0)
> >>> +
> 
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel