[DTrace-devel] [PATCH v5] Fix records for aggregation descriptions

Wed Aug 31 14:48:33 UTC 2022

On Wed, Aug 31, 2022 at 01:51:57AM -0400, Eugene Loh via DTrace-devel wrote:
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> There are still some messy things (in my opinion), but we might as well move
> forward, particularly in that a bunch of this stuff is still going to be
> changing in subsequent patches anyhow.  A few items below:
> 
> On 8/30/22 22:17, Kris Van Hees via DTrace-devel wrote:
> > diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
> > @@ -791,13 +782,22 @@ dt_aggregate_valcmp(const void *lhs, const void *rhs)
> >   		if (lrec->dtrd_offset > rrec->dtrd_offset)
> >   			return DT_GREATERTHAN;
> > -		if (lrec->dtrd_action < rrec->dtrd_action)
> > +		if (lrec->dtrd_size < rrec->dtrd_size)
> >   			return DT_LESSTHAN;
> > -		if (lrec->dtrd_action > rrec->dtrd_action)
> > +		if (lrec->dtrd_size > rrec->dtrd_size)
> >   			return DT_GREATERTHAN;
> >   	}
> > +	lrec = &lagg->dtagd_drecs[DT_AGGDATA_RECORD];
> > +	rrec = &ragg->dtagd_drecs[DT_AGGDATA_RECORD];
> > +
> > +	if (lrec->dtrd_action < rrec->dtrd_action)
> > +		return DT_LESSTHAN;
> > +
> > +	if (lrec->dtrd_action > rrec->dtrd_action)
> > +		return DT_GREATERTHAN;
> > +
> >   	laddr = (int64_t *)(uintptr_t)(ldata + lrec->dtrd_offset);
> >   	raddr = (int64_t *)(uintptr_t)(rdata + rrec->dtrd_offset);
> 
> Ouch, my head is starting to hurt.  In your response to my v4 feedback, you
> wrote:
>         I'll have to check but I am pretty sure that there is no point in
> checking the
>         action of the key records anymore because those are always the
> same.  But it
>         would make sense to check the sizes, I think.
> In the above changes, I see where you change the action checks to size
> checks for key records.  For the data record, there are still action
> checks.  I don't understand any of this but figured I'd point it out to you
> to double check.  I don't know what's right or wrong here.

The key records are all DTRACEACT_DIFEXPR, incl. the varID.  The data counter
(not relevant here) and the actual data record have as action the id of the
aggregating function.  So, when comparing aggregations, you might end up
seeing an aggregation that uses the same keys but is actual a different
aggregation because it uses a different aggregating function.

That said, this can be optimized further based on the fact that we are storing
the varID in the first record of the key.  When those match, you know that the
layout (offsets) of the keys must be identical so there is no need to compare
offsets and sizes.  And then the action id stored in the data records is also
guaranteed to be the same.

I think for now I will stick with what I have and not optimize it that far
because the (positive) side effects of using varID as the first elements is
probably getting even more complex to think through when people look at this
code, and then we start depending on much more implied guarantees.

> > diff --git a/libdtrace/dt_map.c b/libdtrace/dt_map.c
> > @@ -275,38 +273,59 @@ dt_aggid_add(dtrace_hdl_t *dtp, const dt_ident_t *aid)
> >   		return dt_set_errno(dtp, EDT_NOMEM);
> >   	/*
> > -	 * Note the relationship between the aggregation storage
> > -	 * size (di_size) and the aggregation data size (dtagd_size):
> > -	 *     di_size = dtagd_size * (num copies) + (size of latch seq #)
> > +	 * The number of key records for an indexed aggregation is one more
> > +	 * than the number of aggregation keys because we store tha aggregation
> 
> s/tha/the/

Thanks.

> > +	 * variable id at the beginning of the BPF hashmap key to distinguish
> > +	 * different aggregations indexed with the same keys.
> >   	 */
> > +	nkrecs = 1 + ((dt_idsig_t *)aid->di_data)->dis_argc;
> > +
> >   	agg->dtagd_id = id;
> >   	agg->dtagd_name = aid->di_name;
> >   	agg->dtagd_sig = ((dt_idsig_t *)aid->di_data)->dis_auxinfo;
> >   	agg->dtagd_normal = 1;
> >   	agg->dtagd_varid = aid->di_id;
> > -	agg->dtagd_size = (aid->di_size - sizeof(uint64_t)) / DT_AGG_NUM_COPIES;
> > -	agg->dtagd_nrecs = agg->dtagd_size / sizeof(uint64_t);
> > +	agg->dtagd_nkrecs = nkrecs;
> > -	recs = dt_calloc(dtp, agg->dtagd_nrecs, sizeof(dtrace_recdesc_t));
> > -	if (recs == NULL) {
> > +	/* Allocate all records at once (varid and keys, data counter, data). */
> > +	krecs = dt_calloc(dtp, nkrecs + 2, sizeof(dtrace_recdesc_t));
> > +	if (krecs == NULL) {
> >   		dt_free(dtp, agg);
> >   		return dt_set_errno(dtp, EDT_NOMEM);
> >   	}
> > @@ -338,7 +357,8 @@ dt_aggid_destroy(dtrace_hdl_t *dtp)
> >   	for (i = 0; i < dtp->dt_maxagg; i++) {
> >   		if (dtp->dt_adesc[i] != NULL) {
> > -			dt_free(dtp, dtp->dt_adesc[i]->dtagd_recs);
> > +			/* Freeing dtagd_recs also frees dtagd_drecs. */
> 
> s/dtagd_recs/dtagd_krecs/

AH yes, thanks.

> > +			dt_free(dtp, dtp->dt_adesc[i]->dtagd_krecs);
> >   			dt_free(dtp, dtp->dt_adesc[i]);
> >   		}
> >   	}
> > diff --git a/libdtrace/dt_printf.c b/libdtrace/dt_printf.c
> > @@ -1264,10 +1264,12 @@ dt_printf_format(dtrace_hdl_t *dtp, FILE *fp, const dt_pfargv_t *pfv,
> >   	dtrace_aggdesc_t *agg;
> >   	caddr_t lim = (caddr_t)buf + len, limit;
> >   	char format[64] = "%";
> > -	int i, aggrec = 0, curagg = -1;
> > +	int i, curagg = -1;
> >   	uint64_t normal, sig;
> >   	/*
> > +	 * FIXME: Comment needs rewrite.
> > +	 *
> >   	 * If we are formatting an aggregation, set 'aggrec' to the index of
> >   	 * the final record description (the aggregation result) so we can use
> >   	 * this record index with any conversion where DT_PFCONV_AGG is set.
> 
> FIXME: Comment needs rewrite.

Indeed.

> > @@ -1809,8 +1806,8 @@ static int
> >   dt_fprinta(const dtrace_aggdata_t *adp, void *arg)
> >   {
> >   	const dtrace_aggdesc_t	*agg = adp->dtada_desc;
> > -	const dtrace_recdesc_t	*rec = agg->dtagd_recs;
> > -	uint_t			nrecs = agg->dtagd_nrecs;
> > +	const dtrace_recdesc_t	*rec = agg->dtagd_krecs;
> > +	uint_t			nrecs = agg->dtagd_nkrecs;
> >   	dt_pfwalk_t		*pfw = arg;
> >   	dtrace_hdl_t		*dtp = pfw->pfw_argv->pfv_dtp;
> 
> I'm still curious about this point from the v4 review:
> 
>     rec should perhaps not start with the varID record but with the first
> (agg)
>     key record.  So rec has to be bumped up, and nrecs has to be decremented
>     accordingly.  Admittedly, this bug predates your patch, but you fix the
>     corresponding problem in dt_fprintas()...

Grrr, yes, the presence of both dt_fprintas() and dt_fprinta() as function
names (while certainly sensible) is annoying.  i thought I fixed it after your
v4 comments but clearly must have been looking at the wrong function.  Fixing
it right now.

Incidentally, I'll see if there is any way I can come up with a test for this
since obviously the testsuite is not catching me forgetting to fix thiss.

> > @@ -1837,8 +1834,8 @@ dt_fprintas(const dtrace_aggdata_t **aggsdata, int naggvars, void *arg)
> >   {
> >   	const dtrace_aggdata_t	*aggdata = aggsdata[0];
> >   	const dtrace_aggdesc_t	*agg = aggdata->dtada_desc;
> > -	const dtrace_recdesc_t	*rec = agg->dtagd_recs;
> > -	uint_t			nrecs = agg->dtagd_nrecs;
> > +	const dtrace_recdesc_t	*rec = &agg->dtagd_krecs[1];
> > +	uint_t			nrecs = agg->dtagd_nkrecs - 1;
> >   	dt_pfwalk_t		*pfw = arg;
> >   	dtrace_hdl_t		*dtp = pfw->pfw_argv->pfv_dtp;
> >   	int			i;
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel