[DTrace-devel] [PATCH 06/13] Add code for llquantize() aggregation function

Tue Dec 8 10:08:23 PST 2020

On Wed, Dec 02, 2020 at 01:54:51PM -0500, eugene.loh at oracle.com wrote:
> From: Eugene Loh <eugene.loh at oracle.com>
> 
> The value is converted to a bin number with manually written BPF code,
> in function dt_cg_agg_llquantize_bin(), rather than with C code that
> is pre-compiled to BPF.  The reason is that the bin computation
> depends on a number of parameters (factor, lmag, hmag, steps) and
> combinations thereof.  Those combinations are a lot of run-time

OK, so the computation of the bin number is done in generated BPF rather
than pre-compiled code (C-to-BPF).
> computations, and the BPF code would put additional, combinatorially
> growing load on the BPF verifier -- e.g., for quantities like

This seems to imply that doing it in BPF (which what you are doing) would
put a burden on the verifier.  That sounds like a contradiction.  Do you
mean here that the C-to-BPF pre-compiled code would impose an undue burden
on the verifier because it needs to be generic code that needs to compute
these values that otherwise can be done at script compile time?

Can you clarify this a bit?

> powl(factor, lmag).  Meanwhile, these parameters are actually known
> at code-generation time.  So, we can do a lot of the computations at
> code-generation time and just load final values into BPF registers,
> saving a lot of run-time computation.
> 
> Perhaps there are some hybrid strategies, such as computing some values
> like powl(factor,lmag) at code-generation time and passing them to a
> pre-compiled BPF function.  On the other hand, we only have 5 arguments
> to BPF functions.  Perhaps some arguments (like factor, lmag, hmag, and
> steps) can be packed together and passed to a BPF function as a single
> argument.  Again, however, this just means more BPF instructions -- say,
> to unpack arguments.  So, we do not use such strategies here.

I think you can omit this.  While definitely good development commentary,
it does not document the patch itself.
> 
> The llquantize function is rather detailed.  Perhaps it does not
> need to be documented in detail for users -- the consumer should
> present data clearly enough that users will see what they're getting
> -- but developers should understand the objective well enough that
> they can judge the code's correctness.  So, there is a big block of
> comments to explain what we're trying to achieve.

No need to defend why there is a lot of comments in the function.  You can
synthesize it down to just:

"The llquantize function is rather complex.  There is a big block of comments
 to explain what we're trying to achieve."

> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> ---
>  libdtrace/dt_cg.c                             | 211 +++++++++++++++++-
>  .../err.D_LLQUANT_STEPVAL.llqdividefactor.d   |  20 ++
>  ...D_LLQUANT_STEPVAL.llqdividepoweroffactor.d |   8 +-
>  .../err.D_LLQUANT_STEPVAL.llqmultiplefactor.d |  20 ++
>  4 files changed, 252 insertions(+), 7 deletions(-)
>  create mode 100644 test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividefactor.d
>  create mode 100644 test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqmultiplefactor.d
> 
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> index 673b9794..6fa49bbc 100644
> --- a/libdtrace/dt_cg.c
> +++ b/libdtrace/dt_cg.c
> @@ -3143,6 +3143,187 @@ dt_cg_agg_quantize_impl(dt_irlist_t *dlp, dt_regset_t *drp, int dreg, int vreg,
>  	TRACE_REGSET("            Impl: End  ");
>  }
>  
> +/*
> + * Log-linear quantization is the most complicated of the *quantize()
> + * aggregation functions.  This function quantizes the value held in
> + * valreg to a 0-based bin number using the parameters factor, lmag,
> + * hmag, and steps.
> + *
> + * First, let's illustrate log-linear based on an example.  We will
> + * bin logarithmically for factor=10 -- that is, 0-9, 10-99, 100-999,
> + * and so on.  Within each logarithmic range, we bin linearly.  E.g.,
> + * for the logarithmic range 100-999, the bins are:
> + *
> + *    +-------+----------------+----------------------------------+
> + *    |steps=5|    steps=10    |             steps=20             |
> + *    +-------+----------------+----------------------------------+
> + *    |100-199|         100-199|                  100-149  150-199|
> + *    |200-399|200-299  300-399|200-249  250-299  300-349  350-399|
> + *    |400-599|400-499  500-599|400-449  450-499  500-549  550-599|
> + *    |600-799|600-699  700-799|600-649  650-699  700-749  750-799|
> + *    |800-999|800-899  900-999|800-849  850-899  900-949  950-999|
> + *    +-------+----------------+----------------------------------+
> + *
> + * Nominally, there are "steps" bins per logarithmic range, but values 0-99
> + * do not actually appear here; they are not part of this logarithmic
> + * range.  For steps=5, this makes the lowest bin narrower than the others;
> + * for steps=10, it removes one bin; and for steps=20 two bins.  That is,
> + * the actual number of bins for a logarithmic range is steps-steps/factor.
> + *
> + * For factor=10, the logarithmic range 10-99 is for mag=1, 100-999 for mag=2,
> + * 1000-9999 for mag=3, and so on.  The parameters lmag and hmag indicate the
> + * inclusive bounds for mag.
> + *
> + * So, all logarithmic ranges, put together, have
> + *
> + *     (hmag - lmag + 1) * (steps - steps/factor)
> + *
> + * bins.  We double that number to account for negative values.  We add
> + * another 3 bins for:
> + *
> + *   - negative overflow
> + *
> + *   - underflow (values close to 0, whether negative or positive)
> + *
> + *   - positive overflow
> + *
> + * The bins are arranged in numerical order, starting with negative underflow,
> + * ending with positive overflow, and with underflow exactly in the middle.

This terminology seems a bit odd to me.  Is it what is used in documentation
or in the original code, or is it new?  I would consider the lowest order
bin to be underflow (less than the lower limit), the highest order bin to be
overflow (greater than the higher limit), and what you refer to as the
underflow bin as the zero bin (where zero is 0 +- some delta of course.

> + */
> +static void
> +dt_cg_agg_llquantize_bin(dt_irlist_t *dlp, dt_regset_t *drp, int valreg,
> +			 int32_t factor, int32_t lmag, int32_t hmag,
> +			 int32_t steps)
> +{
> +	/*
> +	 * We say there are "steps" bins per logarithmic range,
> +	 * but steps/factor of them actually overlap with lower ranges.
> +	 */
> +	int steps_factor = steps / factor;
> +
> +	/* the underflow bin is in the middle */
> +	int underflow_bin = 1 + (hmag - lmag + 1) * (steps - steps_factor);
> +
> +	/* initial bucket_max */
> +	uint64_t bucket_max0 = powl(factor, lmag);
> +
> +	/* registers */
> +	int indreg, magreg, tmpreg, maxreg;
> +
> +	/* labels */
> +	uint_t		L1 = dt_irlist_label(dlp);
> +	uint_t		L2 = dt_irlist_label(dlp);
> +	uint_t		L3 = dt_irlist_label(dlp);
> +	uint_t		L4 = dt_irlist_label(dlp);
> +	uint_t		Lloop = dt_irlist_label(dlp);
> +	uint_t		Lbin = dt_irlist_label(dlp);
> +	uint_t		Lshift = dt_irlist_label(dlp);
> +	uint_t		Lend = dt_irlist_label(dlp);
> +
> +	TRACE_REGSET("            Impl: Begin");

This should be some other regset trace tag because Impl: is used for the
implementation code for the aggregation data update.  Having it here would be
confusing.  Perhaps something like "Bin: Begin" (and end with "Bin: End  ")?

> +
> +	if ((indreg = dt_regset_alloc(drp)) == -1 ||
> +	    (magreg = dt_regset_alloc(drp)) == -1 ||
> +	    (tmpreg = dt_regset_alloc(drp)) == -1 ||
> +	    (maxreg = dt_regset_alloc(drp)) == -1)
> +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> +
> +	/*
> +	 * %tmp = %val
> +	 * if (%val >= 0) goto L1
> +	 * %tmp *= -1
> +	 */
> +	emit(dlp, BPF_MOV_REG(tmpreg, valreg));
> +	emit(dlp, BPF_BRANCH_IMM(BPF_JSGE, valreg, 0, L1));
> +	emit(dlp, BPF_ALU64_IMM(BPF_MUL, tmpreg, -1));
> +
> +	/*
> +	 * check for "underflow" (smaller than the smallest bin)
> +	 * L1:
> +	 *     %mag = underflow_bin
> +	 *     %max = bucket_max0
> +	 *     if (%tmp < %max) goto Lend
> +	 */
> +	emitl(dlp, L1,
> +		   BPF_MOV_IMM(magreg, underflow_bin));
> +	dt_cg_xsetx(dlp, NULL, DT_LBL_NONE, maxreg, bucket_max0);
> +	emit(dlp, BPF_BRANCH_REG(BPF_JLT, tmpreg, maxreg, Lend));
> +
> +	/*
> +	 * loop over the logarithmic ranges
> +	 *     %mag = lmag                    lowest range
> +	 *     %ind = 0                       bin within this range
> +	 * Lloop:
> +	 *     %max *= factor;                increase max
> +	 *     if (%tmp < %max) goto Lbin;    found the right range
> +	 *     %mag++;                        otherwise, increase range
> +	 *     if (%mag > hmag) goto Lshift;  went beyond last range
> +	 *     goto Lloop;
> +	 */
> +	emit(dlp, BPF_MOV_IMM(magreg, lmag));
> +	emit(dlp, BPF_MOV_IMM(indreg, 0));
> +	emitl(dlp, Lloop,
> +		   BPF_ALU64_IMM(BPF_MUL, maxreg, factor));
> +	emit(dlp, BPF_BRANCH_REG(BPF_JLT, tmpreg, maxreg, Lbin));
> +	emit(dlp, BPF_ALU64_IMM(BPF_ADD, magreg, 1));
> +	emit(dlp, BPF_BRANCH_IMM(BPF_JGT, magreg, hmag, Lshift));
> +	emit(dlp, BPF_JUMP(Lloop));
> +
> +	/*
> +	 * Found a bin
> +	 * Evaluate (%tmp*steps/%max) in a way that it will not overflow
> +	 * Lbin:
> +	 *     if (%mag != 0) %ind = %tmp / (%max / steps)
> +	 *     else           %ind = %tmp * steps / %max
> +	 */
> +	emitl(dlp, Lbin,
> +		   BPF_MOV_REG(indreg, tmpreg));
> +	emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, magreg, 0, L2));
> +	emit(dlp, BPF_ALU64_IMM(BPF_DIV, maxreg, steps));
> +	emit(dlp, BPF_JUMP(L3));
> +	emitl(dlp, L2,
> +		   BPF_ALU64_IMM(BPF_MUL, indreg, steps));
> +	emitl(dlp, L3,
> +		   BPF_ALU64_REG(BPF_DIV, indreg, maxreg));
> +
> +	/*
> +	 * shift for low indices that can never happen
> +	 *     %ind -= steps_factor
> +	 */
> +	emit(dlp, BPF_ALU64_IMM(BPF_SUB, indreg, steps_factor));
> +
> +	/*
> +	 * shift to center around underflow_bin
> +	 * Lshift:
> +	 *     %mag = (%mag - lmag) * (steps - steps_factor) + %ind + 1
> +	 *     if (%v < 0) %mag *= -1
> +	 *     %mag += underflow_bin
> +	 */
> +	emitl(dlp, Lshift,
> +		   BPF_ALU64_IMM(BPF_SUB, magreg, lmag));
> +	emit(dlp, BPF_ALU64_IMM(BPF_MUL, magreg, steps - steps_factor));
> +	emit(dlp, BPF_ALU64_REG(BPF_ADD, magreg, indreg));
> +	emit(dlp, BPF_ALU64_IMM(BPF_ADD, magreg, 1));
> +	emit(dlp, BPF_BRANCH_IMM(BPF_JSGE, valreg, 0, L4));
> +	emit(dlp, BPF_ALU64_IMM(BPF_MUL, magreg, -1));
> +	emitl(dlp, L4,
> +		   BPF_ALU64_IMM(BPF_ADD, magreg, underflow_bin));
> +
> +	/*
> +	 * Lend:
> +	 *     %val = %mag
> +	 */
> +	emitl(dlp, Lend,
> +		   BPF_MOV_REG(valreg, magreg));

I think there is merit in aligning all the BPF_* arguments to emit*() macros
in this function.  That really helps reading when you want to just see the 
generated code.

> +
> +	dt_regset_free(drp, indreg);
> +	dt_regset_free(drp, magreg);
> +	dt_regset_free(drp, tmpreg);
> +	dt_regset_free(drp, maxreg);
> +
> +	TRACE_REGSET("            Impl: End  ");

See above.

> +}
> +
>  static void
>  dt_cg_agg_llquantize(dt_pcb_t *pcb, dt_ident_t *aid, dt_node_t *dnp,
>  		     dt_irlist_t *dlp, dt_regset_t *drp)
> @@ -3163,7 +3344,7 @@ dt_cg_agg_llquantize(dt_pcb_t *pcb, dt_ident_t *aid, dt_node_t *dnp,
>  	dt_node_t	*incr;
>  	dt_idsig_t	*isp;
>  	uint64_t	factor, lmag, hmag, steps, arg, oarg;
> -	int		sz;
> +	int		sz, ireg;
>  
>  	if (arg1->dn_kind != DT_NODE_INT)
>  		dnerror(arg1, D_LLQUANT_FACTORTYPE, "llquantize( ) argument #1 "
> @@ -3243,7 +3424,7 @@ dt_cg_agg_llquantize(dt_pcb_t *pcb, dt_ident_t *aid, dt_node_t *dnp,
>  					"llquantize() argument #4 (steps) must "
>  					"evenly divide pow(argument #1 "
>  					"(factor), lmag (argument #2) + 1) "
> -					"when steps>factor and lmag==0\n");
> +					"when steps>factor and lmag>0\n");

Nice catch!  That is a bug that dates back to the original implementation.

>  		}
>  	}
>  
> @@ -3308,11 +3489,35 @@ dt_cg_agg_llquantize(dt_pcb_t *pcb, dt_ident_t *aid, dt_node_t *dnp,
>  
>  	incr = dt_cg_agg_opt_incr(dnp, arg4, "llquantize", 6);
>  
> -	sz = (hmag - lmag + 1) * (steps - steps / factor) * 2 + 4;
> +	sz = (hmag - lmag + 1) * (steps - steps / factor) * 2 + 3;

I would actually add a comment here referring back to the comment block at
the beginning of the patch, since people should look there for an explanation
why this is the amount of data items we need.

>  	sz *= sizeof(uint64_t);
>  
>  	if (aid->di_offset == -1)
>  		aid->di_offset = DT_CG_AGG_OFFSET(pcb, sz);

This becomes DT_CG_AGG_SET_STORAGE(aid, sz); but I can take care of that when
I integrate the patches into the branch (same for the other patches that need
this).

> +
> +	TRACE_REGSET("    AggLLq: Begin");

If you are making changes anyway, perhaps make this AggLlq?  The double
capital L looks odd to me.  Minor thing though.

> +	dt_cg_node(dnp->dn_aggfun->dn_args, dlp, drp);
> +	dt_cg_agg_llquantize_bin(dlp, drp, dnp->dn_aggfun->dn_args->dn_reg,
> +	    factor, lmag, hmag, steps);
> +
> +	if (incr == NULL) {
> +		if ((ireg = dt_regset_alloc(drp)) == -1)
> +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> +		emit(dlp, BPF_MOV_IMM(ireg, 1));
> +	} else {
> +		dt_cg_node(incr, dlp, drp);
> +		ireg = incr->dn_reg;
> +	}
> +
> +	DT_CG_AGG_IMPL(aid, sz, dlp, drp, dt_cg_agg_quantize_impl,
> +		       dnp->dn_aggfun->dn_args->dn_reg, ireg,
> +		       (hmag - lmag + 1) * (steps - steps / factor) * 2 + 2);
> +
> +	dt_regset_free(drp, dnp->dn_aggfun->dn_args->dn_reg);
> +	dt_regset_free(drp, ireg);
> +
> +	TRACE_REGSET("    AggLLq: End  ");
>  }
>  
>  static void
> diff --git a/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividefactor.d b/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividefactor.d
> new file mode 100644
> index 00000000..009ae544
> --- /dev/null
> +++ b/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividefactor.d
> @@ -0,0 +1,20 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +/*
> + * ASSERTION:
> + *  llquantize() steps must divide factor when steps<factor
> + *
> + * SECTION: Aggregations/Aggregations
> + *
> + */
> +
> +
> +BEGIN
> +{
> +	@ = llquantize(timestamp, 10, 0, 6, 3);

Replace timestamp with a constant value, so that we avoid depending on the
builtin variable 'timestamp' unnecessarily.  It is not relevant to the test
condition.

> +}
> diff --git a/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividepoweroffactor.d b/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividepoweroffactor.d
> index 68bf4001..d5499083 100644
> --- a/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividepoweroffactor.d
> +++ b/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqdividepoweroffactor.d
> @@ -1,14 +1,14 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2013, 2017 Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2013, 2020 Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> -/* @@xfail: dtv2 */
>  
>  /*
>   * ASSERTION:
> - *  llquantize() steps must evenly factor*factor when steps>factor and lmag==0.
> + *  llquantize() steps must evenly divide factor*factor
> + *  when steps>factor and lmag==0.
>   *
>   * SECTION: Aggregations/Aggregations
>   *
> @@ -17,6 +17,6 @@
>  
>  BEGIN
>  {
> -	@a[1] = llquantize(timestamp, 10, 0, 6, 30);
> +	@ = llquantize(timestamp, 10, 0, 6, 30);

Replace timestamp with a constant value, so that we avoid depending on the
builtin variable 'timestamp' unnecessarily.  It is not relevant to the test
condition.

>  }
>  
> diff --git a/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqmultiplefactor.d b/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqmultiplefactor.d
> new file mode 100644
> index 00000000..21375e6c
> --- /dev/null
> +++ b/test/unittest/aggs/err.D_LLQUANT_STEPVAL.llqmultiplefactor.d
> @@ -0,0 +1,20 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +/*
> + * ASSERTION:
> + *  llquantize() steps must be a multiple of factor when steps>factor
> + *
> + * SECTION: Aggregations/Aggregations
> + *
> + */
> +
> +
> +BEGIN
> +{
> +	@ = llquantize(timestamp, 10, 0, 6, 15);

Replace timestamp with a constant value, so that we avoid depending on the
builtin variable 'timestamp' unnecessarily.  It is not relevant to the test
condition.

> +}
> -- 
> 2.18.4
> 
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel