[DTrace-devel] [DTrace] DTrace userspace branch dm/2.0-branch-dev-aggs updated. 2ffdae6376cbe280267577edbfc5dc93a776398f

david.mclean at oracle.com david.mclean at oracle.com
Wed Nov 25 11:25:30 PST 2020



On 11/25/20 9:41 AM, Eugene Loh wrote:
> I see this is WIP and you're not ready for review, but I happened to
> take a peek anyhow to see where stddev is headed.  Comments below.
I'm glad you did.  I was hoping for some confirmation I've been heading 
in a good direction.
> 
> On 11/24/2020 08:57 PM, nick.alcock at oracle.com wrote:
> 
>> commit 2ffdae6376cbe280267577edbfc5dc93a776398f
>> Author: David  Mc Lean <david.mclean at oracle.com>
>> Date:   Thu Nov 19 12:04:39 2020 -0800
>>
>>       aggregations max(), min(), sum(), and stddev() WIP
>>
>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>
>> +static void
>> +dt_cg_agg_stddev_impl(dt_irlist_t *dlp, dt_regset_t *drp, int dreg, int vreg)
>> +{
>> +	int idxreg, accreg, hi_reg, medreg, lowreg;
>> +
>> +	/* labels */
>> +	uint_t		Lpos = dt_irlist_label(dlp);
>> +
>> +
>> +	TRACE_REGSET("            Impl: Begin");
>> +
>> +	/* get some registers to work with */
>> +	if ((idxreg = dt_regset_alloc(drp)) == -1 ||
>> +	    (accreg = dt_regset_alloc(drp)) == -1 ||
>> +	    (hi_reg = dt_regset_alloc(drp)) == -1 ||
>> +	    (medreg = dt_regset_alloc(drp)) == -1 ||
>> +	    (lowreg = dt_regset_alloc(drp)) == -1 ||)
>> +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>> +
>> +	/* dst[0]++; increment count of calls eq. value count */
>> +	BPFCODE(BPF_MOV_IMM(idxreg, 1))
>> +	BPFCODE(BPF_XADD_REG(BPF_DW, dreg, 0, idxreg))
>> +	dt_regset_free(drp, idxreg);
> 
> 1.  You use idxreg below, so you cannot free it here.  (Or you'd have to
> reallocate it or something funny.)  Not sure why it's called idxreg.

Yeah, I'll lose this free.  The naming was just nominal.  I am trying to 
keep the names all 6 chars long, but I don't  care much beyond that for 
the name of this generic use (in later code) one.

> 
> 2.  The aggregation functions started with this foo() and foo_impl()
> approach, where foo_impl() would get called twice for each update (since
> there were two copies to update).  With the *quantize() functions, which
> have "complicated" conversions from values to bin numbers, there is a
> third function.  We call that other function once and the foo_impl()
> function twice.  In your case, I would imagine this new function to
> determine the 128-bit square, placing this result in a pair of
> registers, and to have the std_impl() function simply do the +1, +val,
> and +valsquared aggregation.

So, still the two functions I have but with the heavy lifting in the 
other function -- if I follow what you are conveying.
Yes, definitely better to run this once if possible.

> 
>> +	/* dst[1] += val; add value to the total sum */
>> +	BPFCODE(BPF_XADD_REG(BPF_DW, dreg, 8, vreg))
>> +
>> +	/* Handle double-dst[3-4] = sq(val) starting here */
>> +
>> +	// If value is negative make it positive, sq() is positive anyhow
>> +	BPFCODE(BPF_MOV_REG(accreg, vreg))
>> +	BPFCODE(BPF_MOV_REG(idxreg, vreg))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, accreg, 0x8000000000000000))
>> +	BPFCODE(BPF_BRANCH_IMM(BPF_JEQ, accreg, 0, Lpos))
> 
> You cannot IMM a 64-bit value.  I think you can just
> BPF_BRANCH_IMM(BPF_JSGE, vreg, 0, Lpos).

Okay.  Good point.  I'll fix it.

> 
>> +
>> +	// Change to a positive value
>> +	BPFCODE(BPF_NEG_REG(idxreg))
> 
> That changes the magnitude.  Probably need BPF_MUL_IMM(regfoo, -1) or
> something.  Note that x and -x are not 2s complements.  If x and y are
> 2s complements, they add up to -1.  x and -x add up to 0.

Thanks for mentioning this.  I was just going with the name of 
BPF_NEG_REG and I knew this needed a closer look to be sure.

>> +	BPFCODE_lbl = Lpos;
> 
> I like blank lines above labels, not below them.  Not sure if Kris has a
> strong preference.
I intended a clean indication of the state (a positive value being held 
in the register), and having a blank line before makes the comment stand 
out better.  Not so important -- I'll drop it.

> 
>> +
>> +	// Now we are sure idxreg holds our absolute value
>> +	// Split our value into two pieces
>> +
>> +	// Put least significant half in lowreg
>> +	BPFCODE(BPF_MOV_REG(accreg, idxreg))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, accreg, 0xFFFFFFFF))
>> +	BPFCODE(BPF_MOV_REG(lowreg, accreg))
> 
> No need for the accreg intermediate register.  Just go to lowreg directly.
K.
> 
>> +
>> +	// Put most significant half in hi_reg
>> +	BPFCODE(BPF_MOV_REG(accreg, idxreg))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, accreg, 0xFFFFFFFF00000000))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_RSH, accreg, 32))
>> +	BPFCODE(BPF_MOV_REG(hi_reg, accreg))
> 
> Again, no need for intermediate register; go to hi_reg directly. Also,
> no need for AND since the RSH will pick up the correct bits.
K.
> 
>> +	// Multiply using algebraic FOIL, to contain interim values to 64 bits
>> +
> 
> For me personally, FOIL is not helpful... I had to look it up. There is
> no ironclad notion of which factor is first, which term is first, etc.
> Personally, high, medium, and low are more descriptive.  That's all
> subjective.  I'm just voting.  Anyhow, the complexity is not
> "multiplying two binomials" but all the cutting and stitching that needs
> to be done for 128-bit multiplies with 64-bit registers.

The idea is the same as FOIL, which I expect is a commonly known method 
(at least easily referencable).  I refer to both FOIL and high medium low.
The main thing I'm trying to highlight by referencing FOIL is that a 
split double value needs to be multiplied as four different values which 
are summed.  I'm sure this idea is obvious to you, but I'm hoping to 
give people some understanding of the approach I'm taking by referring 
to something that is a shared concept.
I'm open to different wording, but I think it is a good idea to have 
some comments which give the user the concept of where I'm heading. 
Maybe I'm more pro-comments than other people though.
I'll add to the comments to clarify my reference to FOIL, but I would 
like to keep the reference to FOIL unless I cannot work past your 
objection with additional wording.

> 
>> +	// Calculate 'O' (== I) first
>> +	// This result represents a value shifted 32 bits to the right
>> +	BPFCODE(BPF_MOV_REG(medreg, lowreg))
>> +	BPFCODE(BPF_ALU64_REG(BPF_MUL, medreg, hi_reg))
>> +
>> +	// Calculate 'L' next
>> +	BPFCODE(BPF_MOV_REG(accreg, lowreg))
>> +	BPFCODE(BPF_ALU64_REG(BPF_MUL, lowreg, accreg))
> 
> For L, maybe BPF_ALU64_REG(BPF_MUL, lowreg, lowreg) just works?  No need
> for accreg?
I didn't know if it would work to use the same reg for both args.  Yeah, 
that looks better (because it is shorter).
> 
>> +	// Calculate 'F' next; this result represents shifted 64 to the right
>> +	BPFCODE(BPF_MOV_REG(accreg, hi_reg))
>> +	BPFCODE(BPF_ALU64_REG(BPF_MUL, hi_reg, accreg))
>> +
> 
> Same thing... I'm thinking no need for accreg; just hi *= hi.
> 
Sure.  Good idea.
>> +	// Now add the pieces and carefully place them in the right place
>> +
>> +	// Shave the high nibble off the low half value to allow for no loss
>> +	// by avoiding overflow in the coming addition.
>> +
>> +	BPFCODE(BPF_MOV_REG(idxreg, lowreg))
>> +	// RSH effectively does this for me anyhow
>> +	// BPFCODE(BPF_ALU64_IMM(BPF_AND, idxreg, 0xF000000000000000))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_RSH, idxreg, 60))
>> +	// Clear the saved nibble in the source value
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, lowreg, 0x0FFFFFFFFFFFFFFF))
> 
> Again, IMM cannot be a 64-bit value.  Loading a 64-bit value costs two
> slots.  Then you need to AND.  So, one alternative is to LSH 4 bits and
> then RSH 4 bits.

Got it.  I like the suggested alternative here, but I'm not sure how to 
accomplish some of the other masking operations (I have used AND for) in 
this code.  On second thought, I can do anything with enough copying and 
shifting.  Maybe not always the prettiest, but should work.

> 
>   From here on out, it's a little hard for me to follow the code.  I'm
> not sure how to solve that.

That is probably an indication I don't have enough comments yet.  Not 
surprising as I was just pushing out the code for this WIP copy.  I can 
surely improve the comments in my next push.

> 
> Incidentally, the DTv1 code has that implementation in terms of
> dtrace_[multiply|shift|add]_128 in dtrace_probe_ctx.c.  Did you decide
> not to go that route?  It's intended to be more general (multiply two
> different factors rather than squaring just one; shift by an arbitrary
> amount rather than a known 32), so it could be simplified a lot before
> implementing in BPF.

I could use some more detail about how I would go that route.  I've been 
just looking at how to get the job done with this BPF code, maybe there 
is a different way that would be better.

> 
>> +	BPFCODE(BPF_MOV_REG(accreg, medreg)) //simmilar nibble thing here
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, accreg, 0x00000000F0000000))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_RSH, accreg, 28))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, medreg, 0xFFFFFFFF0FFFFFFF))
>> +	// The values for 'O' are doubled to represent 'I' too.
>> +	BPFCODE(BPF_ALU64_IMM(BPF_MUL, accreg, 2))
>> +
>> +	// Get the low value part from medreg and add it into lowreg
>> +	BPFCODE(BPF_ALU64_REG(BPF_ADD, accreg, idxreg))
>> +	BPFCODE(BPF_MOV_REG(idxreg, medreg))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_LSH, idxreg, 32))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_MUL, idxreg, 2)) // values for 'O' are doubled
>> +	BPFCODE(BPF_ALU64_REG(BPF_ADD, lowreg, idxreg))
>> +
>> +	// Get that last possible high nibble from lowreg
>> +	BPFCODE(BPF_MOV_REG(idxreg, lowreg))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_RSH, idxreg, 60))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, lowreg, 0x0FFFFFFFFFFFFFFF))
>> +
>> +	// most significant nibble of the low half, + overflow above that nibble
>> +	BPFCODE(BPF_ALU64_REG(BPF_ADD, accreg, idxreg))
>> +	// Put the final nibble in place for the low half
>> +	BPFCODE(BPF_MOV_REG(idxreg, accreg))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_AND, lowreg, 0xF))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_LSH, idxreg, 60))
>> +	BPFCODE(BPF_ALU64_REG(BPF_ADD, lowreg, idxreg))
>> +
>> +	// Begin putting together the high half value
>> +	BPFCODE(BPF_ALU64_IMM(BPF_RSH, accreg, 4))
>> +	BPFCODE(BPF_MOV_REG(idxreg, medreg))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_RSH, idxreg, 32))
>> +	BPFCODE(BPF_ALU64_IMM(BPF_MUL, idxreg, 2)) // values for 'O' are doubled
>> +	BPFCODE(BPF_ALU64_REG(BPF_ADD, hi_reg, accreg))
>> +	BPFCODE(BPF_ALU64_REG(BPF_ADD, hi_reg, idxreg))
>> +
>> +	// I might need to account for overflow from adding to the value
>> +	// already in dreg location offset 24
>> +	BPFCODE(BPF_XADD_REG(BPF_DW, dreg, 16, hi_reg))
>> +	BPFCODE(BPF_XADD_REG(BPF_DW, dreg, 24, lowreg))
> 
> Again, this XADD stuff makes sense for the std_impl() function, which is
> called twice, but the 128-bit square should perhaps be done only once,
> before std_impl() is called.

So, in the dt_cg_agg_stddev function then.

> 
>> +	dt_regset_free(drp, idxreg);
>> +	dt_regset_free(drp, accreg);
>> +	dt_regset_free(drp, hi_reg);
>> +	dt_regset_free(drp, medreg);
>> +	dt_regset_free(drp, lowreg);
>> +
>> +	TRACE_REGSET("            Impl: End  ");
>> +}
>> +
>>    static void
>>    dt_cg_agg_stddev(dt_pcb_t *pcb, dt_ident_t *aid, dt_node_t *dnp,
>>    		 dt_irlist_t *dlp, dt_regset_t *drp)
>>    {
>>    	int	sz = 4 * sizeof(uint64_t);
>>    
>> -	TRACE_REGSET("    AggStd : Begin");
>> +	TRACE_REGSET("    AggStddev : Begin");
>>    
>>    	if (aid->di_offset == -1)
>>    		aid->di_offset = DT_CG_AGG_OFFSET(pcb, sz);
>>    
>> -	TRACE_REGSET("    AggStd : End  ");
>> +	dt_cg_node(dnp->dn_aggfun->dn_args, dlp, drp);
>> +	DT_CG_AGG_IMPL(aid, sz, dlp, drp, dt_cg_agg_stddev_impl,
>> +		       dnp->dn_aggfun->dn_args->dn_reg);
>> +	dt_regset_free(drp, dnp->dn_aggfun->dn_args->dn_reg);
>> +
>> +	TRACE_REGSET("    AggStddev : End  ");
>> +}
> I think Kris wants fixed width on the TRACE_REGSET label widths. So:
> "AggStd: End  " or something.  Same for Sum.  And the Begin will come
> after the aid->di_offset stuff.  Once the other aggregation functions
> have stabilized (I'm working on that), hopefully all this will become
> more clear.

Okay, I'll change it to AggSdv then.  I just think AggStd is not the 
best name.

> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
> 



More information about the DTrace-devel mailing list