[DTrace-devel] [PATCH v3 15/16 was 12/12] Implement the io provider

Kris Van Hees kris.van.hees at oracle.com
Fri Jan 19 16:30:05 UTC 2024


On Thu, Jan 18, 2024 at 04:14:16PM -0500, Eugene Loh via DTrace-devel wrote:
> I would not want to bet my life on this patch (it's pretty complicated...
> the patch, I mean), but
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> 
> My main concern is the testing.  This provider is kind of complicated, with
> lots of different code paths, and so deserves robust tests.  My attempts
> with local2 and nfs2 fail on occasion. I sent you (personally, off list)
> some test patches (5-patch series), I think on Dec 7.  Those patches can be
> squashed back into this patch.  The reason for the patch series is so that
> it'll be easier for you to see what's going on, but I think the patches
> should be squashed back into here.  Actually, I'll just set up a branch with
> the patches on it rather than messing with all that over email.  The branch
> will have a few more patches beyond the Dec 7 series.
> 
> I still find that tests occasionally fail.  It looks like something is
> getting corrupted... might be as simple as garbled output, but might be
> something deeper than that.  I don't know.  I try another test with io:::
> {@[probename]=count()} and if it runs a little while it gets garbage probe
> names.  Stuff like "doart" (some combination of "done" and "start").  I
> cannot reproduce these problems with other providers, but I am still not
> sure that the problems are related to the io provider.  Under investigation.

I definitely look forward to seeing results and analysis to see what might be
going on.  E.g. I am not seeing the corrupted probename issue you see, even if
I let it run for 30 minutes.

> Nonetheless, the io probes seem to be good "nearly all the time" and so this
> looks like a meaningful first release of the io provider.

Yes, and I expect that whatever the problem is, it is more likely related to
underlying code that we depend on rather than the provider itself.  But we'll
see.

> Some extremely minor comments:
> 
> On 1/12/24 19:24, Kris Van Hees via DTrace-devel wrote:
> > diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> > +	dtp->dt_maxlvaralloc = 1;	/* force the lvars map to be created */
> 
> I assume we cannot do this conditionally, based on whether the io provider
> needs this?

The challenge is that the trampoline code for the underlying probe is created
before we generate the trampolines for the dependent probes.  So, when that
first trampoline is created we won't know yet whether we need lvars or not.
If no other code in the script needs lvars, without this, no lvars map would
be created and thus the member in the dctx that holds the ref to the lvars
map would not have a valid value.

A future enhancement *might* be to generate the sub-programs for the dependent
probes first, so that their trampolines are created as functions.  And then
when the trampolines are created for the actual probe programs we need to
attach to probes, linking will pull in the sub-programs and more importantly,
there would be information at that time about the need for lvars and other
quantities that are determined based on compilations.  But that is future work,
and I do not know yet how realistic that is right now.

> > diff --git a/libdtrace/dt_prov_io.c b/libdtrace/dt_prov_io.c
> > +/* List of provider-specific variables. */
> > +static dt_ident_t v_bio = { "-io-bio", DT_IDENT_SCALAR,
> > +			    DT_IDFLG_LOCAL | DT_IDFLG_WRITE, 0, DT_ATTR_STABCMN,			    DT_VERS_2_0, &dt_idops_type, "struct bio" };
> 
> Missing line break.

Oops - fixed.

> > +static dt_ident_t v_biop = { "-io-bio-ptr", DT_IDENT_SCALAR,
> > +			     DT_IDFLG_TLS | DT_IDFLG_WRITE, 0, DT_ATTR_STABCMN,
> > +			     DT_VERS_2_0, &dt_idops_type, "struct bio *" };
> > +
> > +/*
> > + * Generate BPF instructions to dereference the pointer in %r3 (after applying
> > + * an optional addend) and read a value of the given 'width'.  The result isu
> 
> s/isu/is/

Thanks.

> > + * stored in register 'reg' (where BPF_REG_0 <= reg <= BPF_REG_5).
> > + *
> > + * Registers %r0-%r5 will be clobbered.  Register 'reg' holds the value.
> > + */
> > +static void deref_r3(dt_irlist_t *dlp, uint_t exitlbl, int addend, int width,
> > +		     int reg)
> > +{
> > +	assert(reg >= BPF_REG_0 && reg <= BPF_REG_5);
> > +
> > +	/* Use slot 0 as temporary storage. */
> > +	emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP));
> > +	emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0)));
> > +
> > +	/* Specify the width of the scalar. */
> > +	emit(dlp, BPF_MOV_IMM(BPF_REG_2, width));
> > +
> > +	/* The source address is already in %r3, but add addend, if any. */
> > +	if (addend)
> > +		emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, addend));
> > +
> > +	/* Perform the copy and check for success. */
> > +	emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read));
> > +	emit(dlp, BPF_BRANCH_IMM(BPF_JSLT, BPF_REG_0, 0, exitlbl));
> > +
> > +	/* Load the result into the specified register. */
> > +	width = bpf_ldst_size(width, 0);
> > +	emit(dlp, BPF_LOAD(width, reg, BPF_REG_FP, DT_TRAMP_SP_SLOT(0)));
> > +}
> > +
> > +static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
> > +{
> > +
> > +	/* Handle the non-XFS wait-done flavor. */
> > +	if (strcmp(prp->desc->prb, "wait-done") == 0) {
> > +		/*
> > +		 * We need instrument submit_bio_wait(struct bio *):
> 
> need instrument
> ->
> need to instrument

Thanks.

> > +		 *   - on entry, store bio in a TLS var
> > +		 *   - on return, get bio (and store 0 to delete the TLS var)
> 
> Can omit "store 0 to" since that's a detail inside of del_var().

Ah yes, I should have caught that when I made the change to explicit delete.

> > +		 * We use a TLS var to distinguish among possible concurrent
> > +		 * submit_bio_wait() on the CPU.
> > +		 */
> > +		dt_cg_tramp_decl_var(pcb, &v_biop);
> > +		if (strcmp(uprp->desc->prb, "entry") == 0) {
> > +			dt_cg_tramp_get_var(pcb, "self->-io-bio-ptr", 1, BPF_REG_3);
> > +			emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(0)));
> > +			emit(dlp, BPF_STORE(BPF_DW, BPF_REG_3, 0, BPF_REG_0));
> > +			return 1;
> > +		} else {
> > +			dt_cg_tramp_get_var(pcb, "self->-io-bio-ptr", 0, BPF_REG_0);
> > +			emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0));
> > +			dt_cg_tramp_del_var(pcb, "self->-io-bio-ptr");
> > +		}
> > +
> > +	}
> > +
> > +done:
> > +	/*
> > +	 * Note: DTrace does not currently support the use of fileinfo_t with
> > +	 * io probes.  In Oracle Linux, there is no information about the file
> > +	 * where the I/O request originated at the point where the io probes
> > +	 * fire.
> > +	 */
> > +	emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0));
> > +
> > +	return 0;
> > +}
> > diff --git a/test/unittest/io/tst.nfs2.sh b/test/unittest/io/tst.nfs2.sh
> > +
> > +mkdir $exdir
> > +  # what is the fsid?
> > +  exportfs -i -v -o "rw,sync,no_root_squash,insecure,fsid=8434437287" 127.0.0.1:$exdir > /dev/null
> > +    mkdir $iodir
> > +        mount -t nfs -o nfsvers=3 127.0.0.1:$exdir $iodir
> > +            $rundt "dd if=/dev/urandom of=$tempfile count=$filesize bs=1 status=none" -o log.write
> > +            myinode=`stat $tempfile  | awk '/	Inode: / {print $4}'`
> > +        umount $iodir
> > +        # fliush caches and remount to force IO
> 
> s/fliush/flush/

Thanks.

> > +	echo 3 > /proc/sys/vm/drop_caches
> > +        mount -t nfs -o nfsvers=3 127.0.0.1:$exdir $iodir
> > +            $rundt "sum $tempfile"                                                    -o log.read
> > +            rm -f $tempfile
> > +        umount $iodir
> > +    rmdir $iodir
> > +  exportfs -u 127.0.0.1:$exdir
> > +rmdir $exdir
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list