[DTrace-devel] [PATCH] io: adjust io provider for NFS tracepoint variants

Eugene Loh eugene.loh at oracle.com
Wed Jan 8 06:01:28 UTC 2025


You already have a R-b, it seems the code is right (woo hoo! cool), and 
we're trying to get a release out the door.  But here are my "late to 
the party" comments.  Feel free to ignore.

The names v1 and v2 strike me as funny;  they just make up new numbers.  
Not a big deal, but ideally the names might more closely reflect the 
actual version changes we're tracking.

Breaking the function out into a new "v1" version is some unneeded 
copy-and-paste code bloat, I think.  The function is kind of large and 
the only difference between v1 and v2 is I guess in the two, relatively 
short, "start" sections.  So, keeping this stuff as one function -- with 
extra logic in the two "start" sections -- might be more compact and 
clearer than having two rather similar functions.

On 1/7/25 17:44, Kris Van Hees wrote:
> Kernels prior to 5.6.0 pass 3 arguments (derived from the NFS hdr)
> to the nfs_initiate_read raw tracepoint, whereas kernels as of 5.6.0
> pass just the NFS hdr.

Is the same true of write?  If so, then maybe say so and point out that 
what we're really doing is changing the handling of nfs "start" while 
leaving nfs "done" alone...  that would correspond more closely to what 
is happening in the code, which talks of start and done.

> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
>   libdtrace/dt_prov_io.c | 127 +++++++++++++++++++++++++++++++++++------
>   1 file changed, 108 insertions(+), 19 deletions(-)
>
> diff --git a/libdtrace/dt_prov_io.c b/libdtrace/dt_prov_io.c
> @@ -155,12 +147,109 @@ static void deref_r3(dt_irlist_t *dlp, uint_t exitlbl, int addend, int width,
> +/*
> + * For NFS events, we have to construct a fake struct bio, which we have to
> + * populate from the inode (arg0) and hdr->good_bytes (arg2) arguments the
> + * underlying probe provides.
> + */
> +static void io_nfs_args_v1(dt_pcb_t *pcb, dt_irlist_t *dlp, uint_t exitlbl,
> +			   const char *prb, const char *uprb)
> +{
> +	int	off;
> +	size_t	siz;
> +
> +	/*
> +	 * Determine the various sizes and offsets we want.
> +	 *
> +	 *     // Access these fields relative to &bio.
> +	 *     struct bio bio = {
> +	 *         .bi_opf = ...,
> +	 *         .bi_iter.bi_size = ...,      // struct bvec_iter bi_iter
> +	 *         .bi_iter.bi_sector = ...,
> +	 *         .bi_bdev = 0,		// -or- .bi_disk = 0
> +	 *     };
> +	 *
> +	 *     // Access these fields relative to hdr.
> +	 *     struct nfs_pgio_header *hdr;
> +	 *     ... = hdr->res.count;            // struct nfs_pgio_res  res
> +	 */
> +
> +	/*
> +	 * Declare the -io-bio variable and store its address in %r6.
> +	 */
> +	dt_cg_tramp_decl_var(pcb, &v_bio);
> +	dt_cg_tramp_get_var(pcb, "this->-io-bio", 1, BPF_REG_6);
> +
> +	/* Fill in bi_opf */
> +	off = dt_cg_ctf_offsetof("struct bio", "bi_opf", &siz, 0);
> +	siz = bpf_ldst_size(siz, 1);
> +	if (strstr(uprb, "read"))
> +		emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, REQ_OP_READ));
> +	else
> +		emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, REQ_OP_WRITE));
> +
> +	/*
> +	 * bio.bi_iter.bi_size = hdr->foo.count;

If "v1" is broken out, then this hdr->foo.count no longer makes sense.  
Plus, the comment is more or less superseded by the following two 
comment lines.

Or, mimic how the comments are handled below for inode.

> +	 *
> +	 * For the 'start' probe, count is arg2
> +	 * For the 'done' probe, count is hdr->res.count (hdr in arg1)
> +	 */
> +	if (strcmp(prb, "start") == 0) {
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2)));
> +	} else {
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1)));
> +		off = dt_cg_ctf_offsetof("struct nfs_pgio_header", "res", NULL, 0)
> +		    + dt_cg_ctf_offsetof("struct nfs_pgio_res", "count", &siz, 0);
> +		deref_r3(dlp, exitlbl, off, siz, BPF_REG_0);
> +	}
> +
> +	off = dt_cg_ctf_offsetof("struct bio", "bi_iter", NULL, 0)
> +	    + dt_cg_ctf_offsetof("struct bvec_iter", "bi_size", &siz, 0);
> +	siz = bpf_ldst_size(siz, 1);
> +	emit(dlp, BPF_STORE(siz, BPF_REG_6, off, BPF_REG_0));
> +
> +	/*
> +	 * bio.bi_iter.bi_sector = inode;
> +	 */
> +	if (strcmp(prb, "start") == 0) {
> +		/* inode is arg0 */
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0)));
> +	} else {
> +		/* use hdr->inode, hdr is arg1 */
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1)));
> +
> +		off = dt_cg_ctf_offsetof("struct nfs_pgio_header", "inode", &siz, 0);
> +		deref_r3(dlp, exitlbl, off, siz, BPF_REG_3);
> +	}
> +
> +	off = dt_cg_ctf_offsetof("struct nfs_inode", "fileid", &siz, 0)
> +	    - dt_cg_ctf_offsetof("struct nfs_inode", "vfs_inode", NULL, 0);
> +	deref_r3(dlp, exitlbl, off, siz, BPF_REG_0);
> +
> +	off = dt_cg_ctf_offsetof("struct bio", "bi_iter", NULL, 0)
> +	    + dt_cg_ctf_offsetof("struct bvec_iter", "bi_sector", &siz, 0);
> +	siz = bpf_ldst_size(siz, 1);
> +	emit(dlp, BPF_STORE(siz, BPF_REG_6, off, BPF_REG_0));
> +
> +	/*
> +	 * bio.bi_bdev = 0;
> +	 */
> +	off = dt_cg_ctf_offsetof("struct bio", "bi_bdev", &siz, 1);
> +	if (off == -1)
> +		off = dt_cg_ctf_offsetof("struct bio", "bi_disk", &siz, 0);
> +	siz = bpf_ldst_size(siz, 1);
> +	emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, 0));
> +
> +	/* Store a pointer to the fake bio in arg0. */
> +	emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_6));
> +}



More information about the DTrace-devel mailing list