[DTrace-devel] [PATCH 4/4] spec: correct semantics of specsize overflow

Thu May 11 22:41:09 UTC 2023

On 5/11/23 16:30, Kris Van Hees wrote:

> On Thu, May 11, 2023 at 02:44:47PM -0400, Eugene Loh via DTrace-devel wrote:
>> On 5/9/23 03:19, Kris Van Hees via DTrace-devel wrote:
>>
>>> diff --git a/bpf/speculation.c b/bpf/speculation.c
>>> @@ -105,29 +105,32 @@ noinline uint32_t dt_speculation(void)
>>> + * We return the speculation entry in the map or 0.
>> How about NULL instead of 0?  Also, since the function is so short and well
>> documented, the lead comment block might be unnecessary.
> NULL is not known as a symbol in these C source files that are compiled into
> BPF, so 0 is the way to go.  Unless we define NULL ourselves somewhere, but
> since we haven't yet, I go with what we have been doing.

Interesting.  NULL works for me.  In fact it's used in other bpf/*.c 
files.  In fact, hang on...

>>> -noinline int32_t
>>> +noinline dt_bpf_specs_t *
>>>    dt_speculation_speculate(uint32_t id)
>>>    {
>>>    	dt_bpf_specs_t *spec;
>>>    	if ((spec = bpf_map_lookup_elem(&specs, &id)) == NULL)
>>> -		return -1;
>>> +		return 0;
>> How about NULL instead of 0?
> Same.

... there is NULL right there.  It's right next to the proposed change.

>>>    	/*
>>>    	 * Spec already being drained: do not continue to emit new
>>>    	 * data into it.
>>>    	 */
>>>    	if (spec->draining)
>>> -		return -1;
>>> +		return 0;
>> How about NULL instead of 0?
> Same.
>
>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>> @@ -1407,10 +1424,48 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>>> +ok:
>>> +	/*
>>> +	 * If this clause contains a speculate() statement, we need to
>>> +	 * generate code to verify that the amount of data recorded for this
>>> +	 * speculation (incl. what was just added) does not exceed the
>>> +	 * specsize limit.  If it does, further execution of this clause
>>> +	 * should be aborted.
>>> +	 */
>>> +	if (cflags & DT_CLSFLAG_SPECULATE) {
>>> +		uint_t	lbl_ok = dt_irlist_label(dlp);
>>> +		size_t	specsize = dtp->dt_options[DTRACEOPT_SPECSIZE];
>>> +
>>> +		/*
>>> +		 * if (*((uint32_t *)&buf[DBUF_SPECID]) != 0) {
>>> +		 *     if (dctx->dmst->specsize + off + size >
>> dctx->dmst->specsize + off + size
>> ->
>> dctx->dmst->specsize + (off + size)
> What does that gain us?

Semantically, I think a+b+c is (a+b)+c.  By writing a+(b+c) it more 
closely matches what we actually do.  Not a big difference, but the two 
extra chars are not a big cost either.

>>> +		emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_0, BPF_REG_9, DBUF_SPECID));
>>> +		emit(dlp,  BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, lbl_ok));
>>> +
>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_STK_DCTX));
>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_0, DCTX_MST));
>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_0, DMST_SPECSIZE));
>>> +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, off + size));
>>> +		emit(dlp,  BPF_BRANCH_IMM(BPF_JLE, BPF_REG_0, specsize, lbl_ok));
>> How about changing
>>      if (reg <= specsize) goto lbl_ok
>> to
>>      if (reg > specsize) goto pcb_exitlbl
>>
>> That would be simpler and map better to epilogue and your check_fault()
>> patch.
>>> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_0, 0));
>>> +		emit(dlp,  BPF_RETURN());
>>> +		dt_regset_free(drp, BPF_REG_0);
>>> +
>>> +		emitl(dlp, lbl_ok,
>>> +			   BPF_NOP());
>> Are the regset_free() and lbl_ok out of order?  E.g., let's say you jump to
>> lbl_ok.  Then, if there were any reg fill code, you would miss it.
> Won't matter anymore after the change you suggested above.  But yes.

It would still matter since there are two jumps to this target. Even if 
one disappears, there is still another jump left.
>>>    	 *		goto exit;
>>> -	 *	*((uint32_t *)&buf[4]) = spec;	// mov [%r9 + 4], %dn_reg
>>> +	 *	*((uint32_t *)&buf[DBUF_SPECID]) = specid;
>>> +	 *				// mov [%r9 + DBUF_SPECID], %dn_reg
>>> +	 *	dctx->dmst->specsize = spec->size;
>>>    	 *	exit:			// nop
>>>    	 */
>>>
>>> diff --git a/test/unittest/speculation/tst.SpecSizeVariations4.d b/test/unittest/speculation/tst.SpecSizeVariations4.d
>> Good, but:
>>
>> 1)  The test (fwiw) was passing.  Then in the last patch it failed. Now it's
>> passing again (this time properly).  From a testing perspective, there was a
>> regression in this test in the previous patch.  Ideally, that would not be
>> so.  Maybe xfail this test in the previous patch, even if that xfail is
>> short-lived?
> The expected results file was wrong.  The test was passing because it was
> looking for the wrong output.  This patch provides the correct implementation
> *and* ensures that the test is looking for the correct output.

Yes, but what I'm saying is that the previous patch would introduce a 
new test failure.  So from a tester's viewpoint, the previous patch 
introduced a regression.  It's preferable to have all tests pass with 
each patch.  If that previous patch exposed a test problem, then it 
should mark it with xfail.  Then this patch resolves the xfail.

>> 3)  The test is weak in that it checks only speculations in a single
>> clause.  How about spreading the speculative data over multiple clauses?
> But the whole point of the test *is* to exercise the semantics of an overflow
> occuring in a clause.
>
> But yes, we do need more (and better) tests for the various speculation buffer
> sizing effects vs the amount of data recorded in a speculation.  It seems like
> they used to exist or were planned to be written (see the comment block in the
> err.BufSizeVariations1.d test).
>
> This particular patch is addressing a specific problem - but I do agree we need
> to add more tests.

Maybe I was mistaken about something.  Let's say multiple clauses all 
contribute to the same speculation.  Individually, each one is below the 
specsize limit, but combined they exceed the limit.  Is there a 
problem?  What should happen?