[DTrace-devel] [PATCH] io: adjust io provider for NFS tracepoint variants
Eugene Loh
eugene.loh at oracle.com
Wed Jan 8 06:01:28 UTC 2025
You already have a R-b, it seems the code is right (woo hoo! cool), and
we're trying to get a release out the door. But here are my "late to
the party" comments. Feel free to ignore.
The names v1 and v2 strike me as funny; they just make up new numbers.
Not a big deal, but ideally the names might more closely reflect the
actual version changes we're tracking.
Breaking the function out into a new "v1" version is some unneeded
copy-and-paste code bloat, I think. The function is kind of large and
the only difference between v1 and v2 is I guess in the two, relatively
short, "start" sections. So, keeping this stuff as one function -- with
extra logic in the two "start" sections -- might be more compact and
clearer than having two rather similar functions.
On 1/7/25 17:44, Kris Van Hees wrote:
> Kernels prior to 5.6.0 pass 3 arguments (derived from the NFS hdr)
> to the nfs_initiate_read raw tracepoint, whereas kernels as of 5.6.0
> pass just the NFS hdr.
Is the same true of write? If so, then maybe say so and point out that
what we're really doing is changing the handling of nfs "start" while
leaving nfs "done" alone... that would correspond more closely to what
is happening in the code, which talks of start and done.
> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
> libdtrace/dt_prov_io.c | 127 +++++++++++++++++++++++++++++++++++------
> 1 file changed, 108 insertions(+), 19 deletions(-)
>
> diff --git a/libdtrace/dt_prov_io.c b/libdtrace/dt_prov_io.c
> @@ -155,12 +147,109 @@ static void deref_r3(dt_irlist_t *dlp, uint_t exitlbl, int addend, int width,
> +/*
> + * For NFS events, we have to construct a fake struct bio, which we have to
> + * populate from the inode (arg0) and hdr->good_bytes (arg2) arguments the
> + * underlying probe provides.
> + */
> +static void io_nfs_args_v1(dt_pcb_t *pcb, dt_irlist_t *dlp, uint_t exitlbl,
> + const char *prb, const char *uprb)
> +{
> + int off;
> + size_t siz;
> +
> + /*
> + * Determine the various sizes and offsets we want.
> + *
> + * // Access these fields relative to &bio.
> + * struct bio bio = {
> + * .bi_opf = ...,
> + * .bi_iter.bi_size = ..., // struct bvec_iter bi_iter
> + * .bi_iter.bi_sector = ...,
> + * .bi_bdev = 0, // -or- .bi_disk = 0
> + * };
> + *
> + * // Access these fields relative to hdr.
> + * struct nfs_pgio_header *hdr;
> + * ... = hdr->res.count; // struct nfs_pgio_res res
> + */
> +
> + /*
> + * Declare the -io-bio variable and store its address in %r6.
> + */
> + dt_cg_tramp_decl_var(pcb, &v_bio);
> + dt_cg_tramp_get_var(pcb, "this->-io-bio", 1, BPF_REG_6);
> +
> + /* Fill in bi_opf */
> + off = dt_cg_ctf_offsetof("struct bio", "bi_opf", &siz, 0);
> + siz = bpf_ldst_size(siz, 1);
> + if (strstr(uprb, "read"))
> + emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, REQ_OP_READ));
> + else
> + emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, REQ_OP_WRITE));
> +
> + /*
> + * bio.bi_iter.bi_size = hdr->foo.count;
If "v1" is broken out, then this hdr->foo.count no longer makes sense.
Plus, the comment is more or less superseded by the following two
comment lines.
Or, mimic how the comments are handled below for inode.
> + *
> + * For the 'start' probe, count is arg2
> + * For the 'done' probe, count is hdr->res.count (hdr in arg1)
> + */
> + if (strcmp(prb, "start") == 0) {
> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2)));
> + } else {
> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1)));
> + off = dt_cg_ctf_offsetof("struct nfs_pgio_header", "res", NULL, 0)
> + + dt_cg_ctf_offsetof("struct nfs_pgio_res", "count", &siz, 0);
> + deref_r3(dlp, exitlbl, off, siz, BPF_REG_0);
> + }
> +
> + off = dt_cg_ctf_offsetof("struct bio", "bi_iter", NULL, 0)
> + + dt_cg_ctf_offsetof("struct bvec_iter", "bi_size", &siz, 0);
> + siz = bpf_ldst_size(siz, 1);
> + emit(dlp, BPF_STORE(siz, BPF_REG_6, off, BPF_REG_0));
> +
> + /*
> + * bio.bi_iter.bi_sector = inode;
> + */
> + if (strcmp(prb, "start") == 0) {
> + /* inode is arg0 */
> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0)));
> + } else {
> + /* use hdr->inode, hdr is arg1 */
> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1)));
> +
> + off = dt_cg_ctf_offsetof("struct nfs_pgio_header", "inode", &siz, 0);
> + deref_r3(dlp, exitlbl, off, siz, BPF_REG_3);
> + }
> +
> + off = dt_cg_ctf_offsetof("struct nfs_inode", "fileid", &siz, 0)
> + - dt_cg_ctf_offsetof("struct nfs_inode", "vfs_inode", NULL, 0);
> + deref_r3(dlp, exitlbl, off, siz, BPF_REG_0);
> +
> + off = dt_cg_ctf_offsetof("struct bio", "bi_iter", NULL, 0)
> + + dt_cg_ctf_offsetof("struct bvec_iter", "bi_sector", &siz, 0);
> + siz = bpf_ldst_size(siz, 1);
> + emit(dlp, BPF_STORE(siz, BPF_REG_6, off, BPF_REG_0));
> +
> + /*
> + * bio.bi_bdev = 0;
> + */
> + off = dt_cg_ctf_offsetof("struct bio", "bi_bdev", &siz, 1);
> + if (off == -1)
> + off = dt_cg_ctf_offsetof("struct bio", "bi_disk", &siz, 0);
> + siz = bpf_ldst_size(siz, 1);
> + emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, 0));
> +
> + /* Store a pointer to the fake bio in arg0. */
> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_6));
> +}
More information about the DTrace-devel
mailing list