[DTrace-devel] [PATCH 4/4] spec: correct semantics of specsize overflow

Thu May 11 23:05:56 UTC 2023

On Thu, May 11, 2023 at 06:41:09PM -0400, Eugene Loh via DTrace-devel wrote:
> On 5/11/23 16:30, Kris Van Hees wrote:
> 
> > On Thu, May 11, 2023 at 02:44:47PM -0400, Eugene Loh via DTrace-devel wrote:
> > > On 5/9/23 03:19, Kris Van Hees via DTrace-devel wrote:
> > > 
> > > > diff --git a/bpf/speculation.c b/bpf/speculation.c
> > > > @@ -105,29 +105,32 @@ noinline uint32_t dt_speculation(void)
> > > > + * We return the speculation entry in the map or 0.
> > > How about NULL instead of 0?  Also, since the function is so short and well
> > > documented, the lead comment block might be unnecessary.
> > NULL is not known as a symbol in these C source files that are compiled into
> > BPF, so 0 is the way to go.  Unless we define NULL ourselves somewhere, but
> > since we haven't yet, I go with what we have been doing.
> 
> Interesting.  NULL works for me.  In fact it's used in other bpf/*.c files. 
> In fact, hang on...

Huh...

> > > > -noinline int32_t
> > > > +noinline dt_bpf_specs_t *
> > > >    dt_speculation_speculate(uint32_t id)
> > > >    {
> > > >    	dt_bpf_specs_t *spec;
> > > >    	if ((spec = bpf_map_lookup_elem(&specs, &id)) == NULL)
> > > > -		return -1;
> > > > +		return 0;
> > > How about NULL instead of 0?
> > Same.
> 
> ... there is NULL right there.  It's right next to the proposed change.

Well, duh.  Bad memory on my part.  Ignore me - I'll make it NULL!

> > > >    	/*
> > > >    	 * Spec already being drained: do not continue to emit new
> > > >    	 * data into it.
> > > >    	 */
> > > >    	if (spec->draining)
> > > > -		return -1;
> > > > +		return 0;
> > > How about NULL instead of 0?
> > Same.
> > 
> > > > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > > > @@ -1407,10 +1424,48 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
> > > > +ok:
> > > > +	/*
> > > > +	 * If this clause contains a speculate() statement, we need to
> > > > +	 * generate code to verify that the amount of data recorded for this
> > > > +	 * speculation (incl. what was just added) does not exceed the
> > > > +	 * specsize limit.  If it does, further execution of this clause
> > > > +	 * should be aborted.
> > > > +	 */
> > > > +	if (cflags & DT_CLSFLAG_SPECULATE) {
> > > > +		uint_t	lbl_ok = dt_irlist_label(dlp);
> > > > +		size_t	specsize = dtp->dt_options[DTRACEOPT_SPECSIZE];
> > > > +
> > > > +		/*
> > > > +		 * if (*((uint32_t *)&buf[DBUF_SPECID]) != 0) {
> > > > +		 *     if (dctx->dmst->specsize + off + size >
> > > dctx->dmst->specsize + off + size
> > > ->
> > > dctx->dmst->specsize + (off + size)
> > What does that gain us?
> 
> Semantically, I think a+b+c is (a+b)+c.  By writing a+(b+c) it more closely
> matches what we actually do.  Not a big difference, but the two extra chars
> are not a big cost either.

I guess personal preference...  I think that adding the parentheses makes it
seem like they are actually significant, which is not the case.  I really mean
this to be: perviously speculated data size (other clauses) + previously
speculated data size (this clause) + data item I am speculating right now.
Hence the lack of grouping.

> > > > +		emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_0, BPF_REG_9, DBUF_SPECID));
> > > > +		emit(dlp,  BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, lbl_ok));
> > > > +
> > > > +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_STK_DCTX));
> > > > +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_0, DCTX_MST));
> > > > +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_0, DMST_SPECSIZE));
> > > > +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, off + size));
> > > > +		emit(dlp,  BPF_BRANCH_IMM(BPF_JLE, BPF_REG_0, specsize, lbl_ok));
> > > How about changing
> > >      if (reg <= specsize) goto lbl_ok
> > > to
> > >      if (reg > specsize) goto pcb_exitlbl
> > > 
> > > That would be simpler and map better to epilogue and your check_fault()
> > > patch.
> > > > +		emit(dlp,  BPF_MOV_IMM(BPF_REG_0, 0));
> > > > +		emit(dlp,  BPF_RETURN());
> > > > +		dt_regset_free(drp, BPF_REG_0);
> > > > +
> > > > +		emitl(dlp, lbl_ok,
> > > > +			   BPF_NOP());
> > > Are the regset_free() and lbl_ok out of order?  E.g., let's say you jump to
> > > lbl_ok.  Then, if there were any reg fill code, you would miss it.
> > Won't matter anymore after the change you suggested above.  But yes.
> 
> It would still matter since there are two jumps to this target. Even if one
> disappears, there is still another jump left.

True.

> > > >    	 *		goto exit;
> > > > -	 *	*((uint32_t *)&buf[4]) = spec;	// mov [%r9 + 4], %dn_reg
> > > > +	 *	*((uint32_t *)&buf[DBUF_SPECID]) = specid;
> > > > +	 *				// mov [%r9 + DBUF_SPECID], %dn_reg
> > > > +	 *	dctx->dmst->specsize = spec->size;
> > > >    	 *	exit:			// nop
> > > >    	 */
> > > > 
> > > > diff --git a/test/unittest/speculation/tst.SpecSizeVariations4.d b/test/unittest/speculation/tst.SpecSizeVariations4.d
> > > Good, but:
> > > 
> > > 1)  The test (fwiw) was passing.  Then in the last patch it failed. Now it's
> > > passing again (this time properly).  From a testing perspective, there was a
> > > regression in this test in the previous patch.  Ideally, that would not be
> > > so.  Maybe xfail this test in the previous patch, even if that xfail is
> > > short-lived?
> > The expected results file was wrong.  The test was passing because it was
> > looking for the wrong output.  This patch provides the correct implementation
> > *and* ensures that the test is looking for the correct output.
> 
> Yes, but what I'm saying is that the previous patch would introduce a new
> test failure.  So from a tester's viewpoint, the previous patch introduced a
> regression.  It's preferable to have all tests pass with each patch.  If
> that previous patch exposed a test problem, then it should mark it with
> xfail.  Then this patch resolves the xfail.

Yes, but again this test was not even valid to begin with due to the entirely
incorrect expected results.

> > > 3)  The test is weak in that it checks only speculations in a single
> > > clause.  How about spreading the speculative data over multiple clauses?
> > But the whole point of the test *is* to exercise the semantics of an overflow
> > occuring in a clause.
> > 
> > But yes, we do need more (and better) tests for the various speculation buffer
> > sizing effects vs the amount of data recorded in a speculation.  It seems like
> > they used to exist or were planned to be written (see the comment block in the
> > err.BufSizeVariations1.d test).
> > 
> > This particular patch is addressing a specific problem - but I do agree we need
> > to add more tests.
> 
> Maybe I was mistaken about something.  Let's say multiple clauses all
> contribute to the same speculation.  Individually, each one is below the
> specsize limit, but combined they exceed the limit.  Is there a problem? 
> What should happen?

That will be handled (and there is actually code for that both on the producer
and consumer side).  This is why the running count of data stored in the
speculation is stored in dctx->dmst->specsize.