From kris.van.hees at oracle.com Tue Apr 1 05:36:16 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 1 Apr 2025 01:36:16 -0400 Subject: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map In-Reply-To: <20250331214501.24126-1-eugene.loh@oracle.com> References: <20250331214501.24126-1-eugene.loh@oracle.com> Message-ID: This is not the way to go about this. If, in order to implement the cpuinfo_t argument to sched probes, a regular BPF array map is needed so that cpuinfo data can be accessed for any given CPU id, then the existing map should be replaced with the new one, and its use updated to access the new one. That way you can also keep the name of the map, etc... Introducing this new map with exactly the same data, and then hoping to deprecate the old one later is making things more messy. On Mon, Mar 31, 2025 at 05:45:00PM -0400, eugene.loh--- via DTrace-devel wrote: > From: Eugene Loh > > The cpuinfo BPF map is a per-CPU map that has CPU information > on each CPU for that CPU. > > Add a cpuinfos BPF map that allows any CPU to access information > for any other CPU. > > For now, we retain the older per-CPU map. If desired, a future > patch can migrate existing uses of the per-CPU map to the new > map, decommissioning the old one. This would include map set up: > > *) libdtrace/dt_dlibs.c: DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR), > > *) libdtrace/dt_impl.h: int dt_cpumap_fd; > > *) libdtrace/dt_bpf.c: dtp->dt_cpumap_fd = ... > libdtrace/dt_bpf.c: CREATE_MAP(cpuinfo) > > and map use: > > *) bpf/get_agg.c > *) bpf/get_bvar.c > *) libdtrace/dt_cg.c > *) libdtrace/dt_prov_lockstat.c > > Signed-off-by: Eugene Loh > --- > libdtrace/dt_bpf.c | 13 +++++++++++++ > libdtrace/dt_dlibs.c | 1 + > libdtrace/dt_impl.h | 1 + > 3 files changed, 15 insertions(+) > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c > index 6d42a96c7..8da51d6b9 100644 > --- a/libdtrace/dt_bpf.c > +++ b/libdtrace/dt_bpf.c > @@ -786,7 +786,20 @@ gmap_create_cpuinfo(dtrace_hdl_t *dtp) > if (dtp->dt_cpumap_fd == -1) > return -1; > > + dtp->dt_cpusmap_fd = create_gmap(dtp, "cpuinfos", > + BPF_MAP_TYPE_HASH, > + sizeof(uint32_t), > + sizeof(dt_bpf_cpuinfo_t), ncpus); > + if (dtp->dt_cpusmap_fd == -1) > + return -1; > + > rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); > + > + for (i = 0, ci = &conf->cpus[0]; i < ncpus && rc != -1; i++, ci++) { > + key = ci->cpu_id; > + rc = dt_bpf_map_update(dtp->dt_cpusmap_fd, &key, &data[ci->cpu_id]); > + } > + > dt_free(dtp, data); > if (rc == -1) > return dt_bpf_error(dtp, > diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c > index 21df22a8a..0f19f3566 100644 > --- a/libdtrace/dt_dlibs.c > +++ b/libdtrace/dt_dlibs.c > @@ -61,6 +61,7 @@ static const dt_ident_t dt_bpf_symbols[] = { > DT_BPF_SYMBOL(agggen, DT_IDENT_PTR), > DT_BPF_SYMBOL(buffers, DT_IDENT_PTR), > DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR), > + DT_BPF_SYMBOL(cpuinfos, DT_IDENT_PTR), > DT_BPF_SYMBOL(dvars, DT_IDENT_PTR), > DT_BPF_SYMBOL(gvars, DT_IDENT_PTR), > DT_BPF_SYMBOL(lvars, DT_IDENT_PTR), > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h > index 68fb8ec53..a5e42801c 100644 > --- a/libdtrace/dt_impl.h > +++ b/libdtrace/dt_impl.h > @@ -390,6 +390,7 @@ struct dtrace_hdl { > int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */ > int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */ > int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */ > + int dt_cpusmap_fd; /* file descriptor for the 'cpuinfos' BPF map */ > int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */ > int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */ > dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */ > -- > 2.43.5 > > > _______________________________________________ > DTrace-devel mailing list > DTrace-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/dtrace-devel From eugene.loh at oracle.com Tue Apr 1 22:54:29 2025 From: eugene.loh at oracle.com (Eugene Loh) Date: Tue, 1 Apr 2025 18:54:29 -0400 Subject: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map In-Reply-To: References: <20250331214501.24126-1-eugene.loh@oracle.com> Message-ID: Here is a proposal.? First, two observations: 1.? (As Alan pointed out to me in a facepalm moment), one can write a simple D script to check enqueue_task_*()'s rq->cpu against the current CPU.? He and I both find that the two CPUs are generally -- but not always -- the same.? So, the strictly correct thing to do is use the rq->cpu value, even though you can just use the current CPU and be correct "99%" of the time. 2.? A BPF program can access per-cpu-array values on other CPUs. Well, I guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for percpu map").? That's in 5.18. That is, UEK9. So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a runtime test whether bpf_map_lookup_percpu_elem() is available.? If so, do that cross-CPU lookup -- the 2/2 patch I posted -- but using the new helper function.? If not, use a simpler on-CPU lookup, which should be right "99%" of the time. (I have a simple patch that uses the current CPU.? Pretty simple.) On 4/1/25 01:36, Kris Van Hees wrote: > This is not the way to go about this. If, in order to implement the cpuinfo_t > argument to sched probes, a regular BPF array map is needed so that cpuinfo > data can be accessed for any given CPU id, then the existing map should be > replaced with the new one, and its use updated to access the new one. That > way you can also keep the name of the map, etc... > > Introducing this new map with exactly the same data, and then hoping to > deprecate the old one later is making things more messy. > > On Mon, Mar 31, 2025 at 05:45:00PM -0400, eugene.loh--- via DTrace-devel wrote: >> From: Eugene Loh >> >> The cpuinfo BPF map is a per-CPU map that has CPU information >> on each CPU for that CPU. >> >> Add a cpuinfos BPF map that allows any CPU to access information >> for any other CPU. >> >> For now, we retain the older per-CPU map. If desired, a future >> patch can migrate existing uses of the per-CPU map to the new >> map, decommissioning the old one. This would include map set up: >> >> *) libdtrace/dt_dlibs.c: DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR), >> >> *) libdtrace/dt_impl.h: int dt_cpumap_fd; >> >> *) libdtrace/dt_bpf.c: dtp->dt_cpumap_fd = ... >> libdtrace/dt_bpf.c: CREATE_MAP(cpuinfo) >> >> and map use: >> >> *) bpf/get_agg.c >> *) bpf/get_bvar.c >> *) libdtrace/dt_cg.c >> *) libdtrace/dt_prov_lockstat.c >> >> Signed-off-by: Eugene Loh >> --- >> libdtrace/dt_bpf.c | 13 +++++++++++++ >> libdtrace/dt_dlibs.c | 1 + >> libdtrace/dt_impl.h | 1 + >> 3 files changed, 15 insertions(+) >> >> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c >> index 6d42a96c7..8da51d6b9 100644 >> --- a/libdtrace/dt_bpf.c >> +++ b/libdtrace/dt_bpf.c >> @@ -786,7 +786,20 @@ gmap_create_cpuinfo(dtrace_hdl_t *dtp) >> if (dtp->dt_cpumap_fd == -1) >> return -1; >> >> + dtp->dt_cpusmap_fd = create_gmap(dtp, "cpuinfos", >> + BPF_MAP_TYPE_HASH, >> + sizeof(uint32_t), >> + sizeof(dt_bpf_cpuinfo_t), ncpus); >> + if (dtp->dt_cpusmap_fd == -1) >> + return -1; >> + >> rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); >> + >> + for (i = 0, ci = &conf->cpus[0]; i < ncpus && rc != -1; i++, ci++) { >> + key = ci->cpu_id; >> + rc = dt_bpf_map_update(dtp->dt_cpusmap_fd, &key, &data[ci->cpu_id]); >> + } >> + >> dt_free(dtp, data); >> if (rc == -1) >> return dt_bpf_error(dtp, >> diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c >> index 21df22a8a..0f19f3566 100644 >> --- a/libdtrace/dt_dlibs.c >> +++ b/libdtrace/dt_dlibs.c >> @@ -61,6 +61,7 @@ static const dt_ident_t dt_bpf_symbols[] = { >> DT_BPF_SYMBOL(agggen, DT_IDENT_PTR), >> DT_BPF_SYMBOL(buffers, DT_IDENT_PTR), >> DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR), >> + DT_BPF_SYMBOL(cpuinfos, DT_IDENT_PTR), >> DT_BPF_SYMBOL(dvars, DT_IDENT_PTR), >> DT_BPF_SYMBOL(gvars, DT_IDENT_PTR), >> DT_BPF_SYMBOL(lvars, DT_IDENT_PTR), >> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h >> index 68fb8ec53..a5e42801c 100644 >> --- a/libdtrace/dt_impl.h >> +++ b/libdtrace/dt_impl.h >> @@ -390,6 +390,7 @@ struct dtrace_hdl { >> int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */ >> int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */ >> int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */ >> + int dt_cpusmap_fd; /* file descriptor for the 'cpuinfos' BPF map */ >> int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */ >> int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */ >> dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */ >> -- >> 2.43.5 >> >> >> _______________________________________________ >> DTrace-devel mailing list >> DTrace-devel at oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/dtrace-devel From kris.van.hees at oracle.com Tue Apr 1 23:03:55 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 1 Apr 2025 19:03:55 -0400 Subject: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map In-Reply-To: References: <20250331214501.24126-1-eugene.loh@oracle.com> Message-ID: On Tue, Apr 01, 2025 at 06:54:29PM -0400, Eugene Loh wrote: > Here is a proposal.? First, two observations: > > 1.? (As Alan pointed out to me in a facepalm moment), one can write a simple > D script to check enqueue_task_*()'s rq->cpu against the current CPU.? He > and I both find that the two CPUs are generally -- but not always -- the > same.? So, the strictly correct thing to do is use the rq->cpu value, even > though you can just use the current CPU and be correct "99%" of the time. Sort of what I expected. But nice to see it confirmed. > 2.? A BPF program can access per-cpu-array values on other CPUs. Well, I > guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for > percpu map").? That's in 5.18. That is, UEK9. We oculd get that backported I bet but that doesn't help upstream. So, not really worth asking for a backport I think. > So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a > runtime test whether bpf_map_lookup_percpu_elem() is available.? If so, do > that cross-CPU lookup -- the 2/2 patch I posted -- but using the new helper > function.? If not, use a simpler on-CPU lookup, which should be right "99%" > of the time. (I have a simple patch that uses the current CPU.? Pretty > simple.) But... 95% correct of the time doesn't quite cut it. I could see some quite useful case for using these probes to specifically capture the times when it does *not* originate from the same CPU. Settling for 95% correctness seems like an odd tradeoff to me. Especally since we can get it right 100% of the time without too much trouble. Just convert the cpuinfo map to a regular array map, and instead of indexing it with a 0 key all the time, index it with the value returned by bpf_get_smp_processor_id((). Then we have code that will work on all kernels - no special casing. And I do not see there being much performance difference by going this route. > On 4/1/25 01:36, Kris Van Hees wrote: > > > This is not the way to go about this. If, in order to implement the cpuinfo_t > > argument to sched probes, a regular BPF array map is needed so that cpuinfo > > data can be accessed for any given CPU id, then the existing map should be > > replaced with the new one, and its use updated to access the new one. That > > way you can also keep the name of the map, etc... > > > > Introducing this new map with exactly the same data, and then hoping to > > deprecate the old one later is making things more messy. > > > > On Mon, Mar 31, 2025 at 05:45:00PM -0400, eugene.loh--- via DTrace-devel wrote: > > > From: Eugene Loh > > > > > > The cpuinfo BPF map is a per-CPU map that has CPU information > > > on each CPU for that CPU. > > > > > > Add a cpuinfos BPF map that allows any CPU to access information > > > for any other CPU. > > > > > > For now, we retain the older per-CPU map. If desired, a future > > > patch can migrate existing uses of the per-CPU map to the new > > > map, decommissioning the old one. This would include map set up: > > > > > > *) libdtrace/dt_dlibs.c: DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR), > > > > > > *) libdtrace/dt_impl.h: int dt_cpumap_fd; > > > > > > *) libdtrace/dt_bpf.c: dtp->dt_cpumap_fd = ... > > > libdtrace/dt_bpf.c: CREATE_MAP(cpuinfo) > > > > > > and map use: > > > > > > *) bpf/get_agg.c > > > *) bpf/get_bvar.c > > > *) libdtrace/dt_cg.c > > > *) libdtrace/dt_prov_lockstat.c > > > > > > Signed-off-by: Eugene Loh > > > --- > > > libdtrace/dt_bpf.c | 13 +++++++++++++ > > > libdtrace/dt_dlibs.c | 1 + > > > libdtrace/dt_impl.h | 1 + > > > 3 files changed, 15 insertions(+) > > > > > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c > > > index 6d42a96c7..8da51d6b9 100644 > > > --- a/libdtrace/dt_bpf.c > > > +++ b/libdtrace/dt_bpf.c > > > @@ -786,7 +786,20 @@ gmap_create_cpuinfo(dtrace_hdl_t *dtp) > > > if (dtp->dt_cpumap_fd == -1) > > > return -1; > > > + dtp->dt_cpusmap_fd = create_gmap(dtp, "cpuinfos", > > > + BPF_MAP_TYPE_HASH, > > > + sizeof(uint32_t), > > > + sizeof(dt_bpf_cpuinfo_t), ncpus); > > > + if (dtp->dt_cpusmap_fd == -1) > > > + return -1; > > > + > > > rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); > > > + > > > + for (i = 0, ci = &conf->cpus[0]; i < ncpus && rc != -1; i++, ci++) { > > > + key = ci->cpu_id; > > > + rc = dt_bpf_map_update(dtp->dt_cpusmap_fd, &key, &data[ci->cpu_id]); > > > + } > > > + > > > dt_free(dtp, data); > > > if (rc == -1) > > > return dt_bpf_error(dtp, > > > diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c > > > index 21df22a8a..0f19f3566 100644 > > > --- a/libdtrace/dt_dlibs.c > > > +++ b/libdtrace/dt_dlibs.c > > > @@ -61,6 +61,7 @@ static const dt_ident_t dt_bpf_symbols[] = { > > > DT_BPF_SYMBOL(agggen, DT_IDENT_PTR), > > > DT_BPF_SYMBOL(buffers, DT_IDENT_PTR), > > > DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR), > > > + DT_BPF_SYMBOL(cpuinfos, DT_IDENT_PTR), > > > DT_BPF_SYMBOL(dvars, DT_IDENT_PTR), > > > DT_BPF_SYMBOL(gvars, DT_IDENT_PTR), > > > DT_BPF_SYMBOL(lvars, DT_IDENT_PTR), > > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h > > > index 68fb8ec53..a5e42801c 100644 > > > --- a/libdtrace/dt_impl.h > > > +++ b/libdtrace/dt_impl.h > > > @@ -390,6 +390,7 @@ struct dtrace_hdl { > > > int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */ > > > int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */ > > > int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */ > > > + int dt_cpusmap_fd; /* file descriptor for the 'cpuinfos' BPF map */ > > > int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */ > > > int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */ > > > dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */ > > > -- > > > 2.43.5 > > > > > > > > > _______________________________________________ > > > DTrace-devel mailing list > > > DTrace-devel at oss.oracle.com > > > https://oss.oracle.com/mailman/listinfo/dtrace-devel From alan.maguire at oracle.com Wed Apr 2 09:37:48 2025 From: alan.maguire at oracle.com (Alan Maguire) Date: Wed, 2 Apr 2025 10:37:48 +0100 Subject: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map In-Reply-To: References: <20250331214501.24126-1-eugene.loh@oracle.com> Message-ID: On 01/04/2025 23:54, Eugene Loh wrote: > Here is a proposal.? First, two observations: > > 1.? (As Alan pointed out to me in a facepalm moment), one can write a > simple D script to check enqueue_task_*()'s rq->cpu against the current > CPU.? He and I both find that the two CPUs are generally -- but not > always -- the same.? So, the strictly correct thing to do is use the rq- >>cpu value, even though you can just use the current CPU and be correct > "99%" of the time. > > 2.? A BPF program can access per-cpu-array values on other CPUs. Well, I > guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for > percpu map").? That's in 5.18. That is, UEK9. > Nice find; this commit looks relatively standalone so you could file a bug to request backport to UEK7U3 if it'd help. No guarantees of course but it's not too distant from 5.15 and we've backported helpers before and managed to deal with kABI issues. > So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a > runtime test whether bpf_map_lookup_percpu_elem() is available.? If so, > do that cross-CPU lookup -- the 2/2 patch I posted -- but using the new > helper function.? If not, use a simpler on-CPU lookup, which should be > right "99%" of the time. (I have a simple patch that uses the current > CPU.? Pretty simple.) For what it's worth, I think it'd probably be more valuable to preserve an accurate CPU id and worry less about the other fields in the cpuinfo_t; i.e. when tracing, I mostly care about accurate cpu id info and never look at the other data in a cpuinfo_t. So if it wasn't possible to retrieve accurate cpuinfo_t info via a cross-cpu lookup via the 5.19 helper, it might be better to fake up a cpuinfo_t with a correct cpu id and other fields unset. I'm probably missing it, but I don't see where those fields are populated currently; tried this a few times and they look to be unset for me aside from cpu id: # dtrace -n 'BEGIN { print((cpuinfo_t *)curcpu); } ' dtrace: description 'BEGIN ' matched 1 probe CPU ID FUNCTION:NAME 5 1 :BEGIN 0xffffe05e7fb61b00 = * (cpuinfo_t) { .cpu_id = (processorid_t)5, } ^C # dtrace -n 'BEGIN { print((cpuinfo_t *)curcpu); } ' dtrace: description 'BEGIN ' matched 1 probe CPU ID FUNCTION:NAME 7 1 :BEGIN 0xffffe05e7fbe1b00 = * (cpuinfo_t) { .cpu_id = (processorid_t)7, } ^C If those other fields are unset, maybe there would be a way to invoke a translator to create a cpuinfo_t from just the cpu id? Not sure about the mechanics here, but my worry would be that it could be exactly the times where we are on cpu x and enqueueing on cpu y we might be interested in, and if that info wasn't preserved we might miss something valuable about how the system was behaving. Anyway thanks for fixing up the sched provider, it's really useful! Alan From eugene.loh at oracle.com Wed Apr 2 19:18:41 2025 From: eugene.loh at oracle.com (Eugene Loh) Date: Wed, 2 Apr 2025 15:18:41 -0400 Subject: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map In-Reply-To: References: <20250331214501.24126-1-eugene.loh@oracle.com> Message-ID: On 4/2/25 05:37, Alan Maguire wrote: > On 01/04/2025 23:54, Eugene Loh wrote: >> Here is a proposal.? First, two observations: >> >> 1.? (As Alan pointed out to me in a facepalm moment), one can write a >> simple D script to check enqueue_task_*()'s rq->cpu against the current >> CPU.? He and I both find that the two CPUs are generally -- but not >> always -- the same.? So, the strictly correct thing to do is use the rq- >>> cpu value, even though you can just use the current CPU and be correct >> "99%" of the time. >> >> 2.? A BPF program can access per-cpu-array values on other CPUs. Well, I >> guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for >> percpu map").? That's in 5.18. That is, UEK9. >> > Nice find; this commit looks relatively standalone so you could file a > bug to request backport to UEK7U3 if it'd help. No guarantees of course > but it's not too distant from 5.15 and we've backported helpers before > and managed to deal with kABI issues. Kris was leaning to not relying on this helper since it's not ubiquitous and I agree.? In particular, it's not too hard just to have a global map that one accesses by cpuid (self or other). >> So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a >> runtime test whether bpf_map_lookup_percpu_elem() is available.? If so, >> do that cross-CPU lookup -- the 2/2 patch I posted -- but using the new >> helper function.? If not, use a simpler on-CPU lookup, which should be >> right "99%" of the time. (I have a simple patch that uses the current >> CPU.? Pretty simple.) > For what it's worth, I think it'd probably be more valuable to preserve > an accurate CPU id Kris agrees and I'm on board. > and worry less about the other fields in the > cpuinfo_t; i.e. when tracing, I mostly care about accurate cpu id info > and never look at the other data in a cpuinfo_t. So if it wasn't > possible to retrieve accurate cpuinfo_t info via a cross-cpu lookup via > the 5.19 helper, it might be better to fake up a cpuinfo_t with a > correct cpu id and other fields unset. I'm probably missing it, but I > don't see where those fields are populated currently; dt_conf.c sets sets cpu_id and cpu_chip. https://docs.oracle.com/en/operating-systems/oracle-linux/dtrace-guide/dtrace-ref-DTraceProviders.html#dt_schedargs_prov says cpu_pset and cpu_lgrp are unsupported. Anyhow, I have a patch (will probably post today) that changes the per-cpu array to a global map and so we should be able to get the CPUs right. > tried this a few > times and they look to be unset for me aside from cpu id: > > # dtrace -n 'BEGIN { print((cpuinfo_t *)curcpu); } ' > dtrace: description 'BEGIN ' matched 1 probe > CPU ID FUNCTION:NAME > 5 1 :BEGIN 0xffffe05e7fb61b00 = * > (cpuinfo_t) { > .cpu_id = (processorid_t)5, > } > > ^C > > # dtrace -n 'BEGIN { print((cpuinfo_t *)curcpu); } ' > dtrace: description 'BEGIN ' matched 1 probe > CPU ID FUNCTION:NAME > 7 1 :BEGIN 0xffffe05e7fbe1b00 = * > (cpuinfo_t) { > .cpu_id = (processorid_t)7, > } > > ^C > > If those other fields are unset, maybe there would be a way to invoke a > translator to create a cpuinfo_t from just the cpu id? Not sure about > the mechanics here, but my worry would be that it could be exactly the > times where we are on cpu x and enqueueing on cpu y we might be > interested in, and if that info wasn't preserved we might miss something > valuable about how the system was behaving. Anyway thanks for fixing up > the sched provider, it's really useful! > > Alan From eugene.loh at oracle.com Wed Apr 2 23:50:50 2025 From: eugene.loh at oracle.com (Eugene Loh) Date: Wed, 2 Apr 2025 19:50:50 -0400 Subject: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map In-Reply-To: References: <20250331214501.24126-1-eugene.loh@oracle.com> Message-ID: On 4/1/25 19:03, Kris Van Hees wrote: > On Tue, Apr 01, 2025 at 06:54:29PM -0400, Eugene Loh wrote: >> Here is a proposal.? First, two observations: >> >> 1.? (As Alan pointed out to me in a facepalm moment), one can write a simple >> D script to check enqueue_task_*()'s rq->cpu against the current CPU.? He >> and I both find that the two CPUs are generally -- but not always -- the >> same.? So, the strictly correct thing to do is use the rq->cpu value, even >> though you can just use the current CPU and be correct "99%" of the time. > Sort of what I expected. But nice to see it confirmed. > >> 2.? A BPF program can access per-cpu-array values on other CPUs. Well, I >> guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for >> percpu map").? That's in 5.18. That is, UEK9. > We oculd get that backported I bet but that doesn't help upstream. So, not > really worth asking for a backport I think. > >> So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a >> runtime test whether bpf_map_lookup_percpu_elem() is available.? If so, do >> that cross-CPU lookup -- the 2/2 patch I posted -- but using the new helper >> function.? If not, use a simpler on-CPU lookup, which should be right "99%" >> of the time. (I have a simple patch that uses the current CPU.? Pretty >> simple.) > But... 95% correct of the time doesn't quite cut it. I could see some quite > useful case for using these probes to specifically capture the times when it > does *not* originate from the same CPU. Settling for 95% correctness seems > like an odd tradeoff to me. Especally since we can get it right 100% of the > time without too much trouble. Just convert the cpuinfo map to a regular > array map, and instead of indexing it with a 0 key all the time, index it with > the value returned by bpf_get_smp_processor_id((). Then we have code that will > work on all kernels - no special casing. And I do not see there being much > performance difference by going this route. This patch is withdrawn. From eugene.loh at oracle.com Thu Apr 3 05:02:52 2025 From: eugene.loh at oracle.com (eugene.loh at oracle.com) Date: Thu, 3 Apr 2025 01:02:52 -0400 Subject: [DTrace-devel] [PATCH v3 2/2] Clean up sched provider trampoline FIXMEs Message-ID: <20250403050252.15239-1-eugene.loh@oracle.com> From: Eugene Loh The sched provider trampoline for enqueue and dequeue probes had pending FIXMEs for providing a cpuinfo_t* for the cpu associated with the run queue. Implement the missing code. Since the cpu associated with the run queue might be different from the cpu where we are running, it becomes necessary to access the cpuinfo for some random cpu. With Linux 5.18, there is a BPF helper function map_lookup_percpu_elem() that allows such lookups on per-cpu arrays. To support older kernels, however, we change the cpuinfo BPF map from per-cpu to global. Also, it is a hash table rather than an array in case cpus are not numbered consecutively. Signed-off-by: Eugene Loh --- bpf/get_agg.c | 2 +- bpf/get_bvar.c | 2 +- libdtrace/dt_bpf.c | 34 ++++++-------- libdtrace/dt_cg.c | 5 ++- libdtrace/dt_prov_lockstat.c | 4 +- libdtrace/dt_prov_sched.c | 74 +++++++++++++++++++++++++------ libdtrace/dt_work.c | 20 +++------ test/unittest/sched/tst.enqueue.d | 1 - 8 files changed, 89 insertions(+), 53 deletions(-) diff --git a/bpf/get_agg.c b/bpf/get_agg.c index c0eb825f0..e70caa6ef 100644 --- a/bpf/get_agg.c +++ b/bpf/get_agg.c @@ -21,7 +21,7 @@ extern struct bpf_map_def cpuinfo; */ noinline uint64_t *dt_no_agg(void) { - uint32_t key = 0; + uint32_t key = bpf_get_smp_processor_id(); dt_bpf_cpuinfo_t *ci; ci = bpf_map_lookup_elem(&cpuinfo, &key); diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c index d372b3445..d81c3605f 100644 --- a/bpf/get_bvar.c +++ b/bpf/get_bvar.c @@ -67,7 +67,7 @@ noinline uint64_t dt_bvar_caller(const dt_dctx_t *dctx) noinline uint64_t dt_bvar_curcpu(const dt_dctx_t *dctx) { - uint32_t key = 0; + uint32_t key = bpf_get_smp_processor_id(); void *val = bpf_map_lookup_elem(&cpuinfo, &key); if (val == NULL) { diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c index 6d42a96c7..d6722cbd1 100644 --- a/libdtrace/dt_bpf.c +++ b/libdtrace/dt_bpf.c @@ -761,37 +761,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp) static int gmap_create_cpuinfo(dtrace_hdl_t *dtp) { - int i, rc; + int i; uint32_t key = 0; dtrace_conf_t *conf = &dtp->dt_conf; size_t ncpus = conf->num_online_cpus; - dt_bpf_cpuinfo_t *data; + dt_bpf_cpuinfo_t data; cpuinfo_t *ci; - /* - * num_possible_cpus <= num_online_cpus: see dt_conf_init. - */ - data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, - sizeof(dt_bpf_cpuinfo_t)); - if (data == NULL) - return dt_set_errno(dtp, EDT_NOMEM); - - for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) - memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t)); - dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo", - BPF_MAP_TYPE_PERCPU_ARRAY, + BPF_MAP_TYPE_HASH, sizeof(uint32_t), - sizeof(dt_bpf_cpuinfo_t), 1); + sizeof(dt_bpf_cpuinfo_t), ncpus); if (dtp->dt_cpumap_fd == -1) return -1; - rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); - dt_free(dtp, data); - if (rc == -1) - return dt_bpf_error(dtp, - "cannot update BPF map 'cpuinfo': %s\n", - strerror(errno)); + memset(&data, 0, sizeof(data)); + for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) { + memcpy(&data.ci, ci, sizeof(cpuinfo_t)); + key = ci->cpu_id; + if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, &data) == -1) + return dt_bpf_error(dtp, + "cannot update BPF map 'cpuinfo': %s\n", + strerror(errno)); + } return 0; } diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c index 6dcf4cd3d..d83b1c2ce 100644 --- a/libdtrace/dt_cg.c +++ b/libdtrace/dt_cg.c @@ -1243,9 +1243,12 @@ dt_cg_epilogue(dt_pcb_t *pcb) } else { idp = dt_dlib_get_map(dtp, "cpuinfo"); assert(idp != NULL); + + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP)); - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl)); emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1)); diff --git a/libdtrace/dt_prov_lockstat.c b/libdtrace/dt_prov_lockstat.c index c73edf9be..8b2cf4da2 100644 --- a/libdtrace/dt_prov_lockstat.c +++ b/libdtrace/dt_prov_lockstat.c @@ -121,11 +121,13 @@ static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) { dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); + assert(idp != NULL); dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_BASE)); - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); emit(dlp, BPF_MOV_REG(BPF_REG_6, BPF_REG_0)); diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c index 3a218f3cb..a548e679f 100644 --- a/libdtrace/dt_prov_sched.c +++ b/libdtrace/dt_prov_sched.c @@ -84,6 +84,40 @@ static int populate(dtrace_hdl_t *dtp) probe_args, probes); } +/* + * Get a pointer to the cpuinfo_t structure for the CPU associated + * with the runqueue that is in arg0. + * + * Clobbers %r1 through %r5 + * Stores pointer to cpuinfo_t struct in %r0 + */ +static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) +{ + dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); + + assert(idp != NULL); + + /* Put the runqueue pointer from mst->arg0 into %r3. */ + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0))); + + /* Turn it into a pointer to its cpu member. */ + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, dt_cg_ctf_offsetof("struct rq", "cpu", NULL, 1))); + + /* Call bpf_probe_read_kernel(%fp + DT_TRAMP_SP_SLOT[0], sizeof(int), %r3) */ + emit(dlp, BPF_MOV_IMM(BPF_REG_2, (int) sizeof(int))); + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP)); + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0))); + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_kernel)); + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, exitlbl)); + + /* Now look up the corresponding cpuinfo_t. */ + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0))); + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); +} + /* * Generate a BPF trampoline for a SDT probe. * @@ -98,18 +132,39 @@ static int populate(dtrace_hdl_t *dtp) */ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) { + dtrace_hdl_t *dtp = pcb->pcb_hdl; dt_irlist_t *dlp = &pcb->pcb_ir; dt_probe_t *prp = pcb->pcb_probe; if (strcmp(prp->desc->prb, "dequeue") == 0) { - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); /* - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU - * associated with the runqueue. + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. + */ + get_cpuinfo(dtp, dlp, exitlbl); + + /* + * Copy arg1 into arg0. */ - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); + + /* Store the cpuinfo_t* in %r0 into arg1. */ + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); } else if (strcmp(prp->desc->prb, "enqueue") == 0) { + /* + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. + */ + get_cpuinfo(dtp, dlp, exitlbl); + + /* + * Copy arg1 into arg0. + */ + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); + + /* Store the cpuinfo_t* in %r0 into arg1. */ + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); + /* * This is ugly but necessary... enqueue_task() takes a flags argument and the * ENQUEUE_HEAD flag is used to indicate that the task is to be placed at the @@ -120,15 +175,6 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) * outside the kernel source tree. */ #define ENQUEUE_HEAD 0x10 - - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); - /* - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU - * associated with the runqueue. - */ - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2))); emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, ENQUEUE_HEAD)); emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0)); diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c index 498d5332a..2167ed299 100644 --- a/libdtrace/dt_work.c +++ b/libdtrace/dt_work.c @@ -37,35 +37,29 @@ END_probe(void) int dt_check_cpudrops(dtrace_hdl_t *dtp, processorid_t cpu, dtrace_dropkind_t what) { - dt_bpf_cpuinfo_t *ci; - uint32_t cikey = 0; + dt_bpf_cpuinfo_t ci; + uint32_t cikey = cpu; uint64_t cnt; int rval = 0; assert(what == DTRACEDROP_PRINCIPAL || what == DTRACEDROP_AGGREGATION); - ci = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, - sizeof(dt_bpf_cpuinfo_t)); - if (ci == NULL) - return dt_set_errno(dtp, EDT_NOMEM); - - if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, ci) == -1) { + if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, &ci) == -1) { rval = dt_set_errno(dtp, EDT_BPF); goto fail; } if (what == DTRACEDROP_PRINCIPAL) { - cnt = ci[cpu].buf_drops - dtp->dt_drops[cpu].buf; - dtp->dt_drops[cpu].buf = ci[cpu].buf_drops; + cnt = ci.buf_drops - dtp->dt_drops[cpu].buf; + dtp->dt_drops[cpu].buf = ci.buf_drops; } else { - cnt = ci[cpu].agg_drops - dtp->dt_drops[cpu].agg; - dtp->dt_drops[cpu].agg = ci[cpu].agg_drops; + cnt = ci.agg_drops - dtp->dt_drops[cpu].agg; + dtp->dt_drops[cpu].agg = ci.agg_drops; } rval = dt_handle_cpudrop(dtp, cpu, what, cnt); fail: - dt_free(dtp, ci); return rval; } diff --git a/test/unittest/sched/tst.enqueue.d b/test/unittest/sched/tst.enqueue.d index f445ac843..28dcace8c 100644 --- a/test/unittest/sched/tst.enqueue.d +++ b/test/unittest/sched/tst.enqueue.d @@ -4,7 +4,6 @@ * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ -/* @@xfail: dtv2 */ #pragma D option switchrate=100hz #pragma D option destructive -- 2.43.5 From eugene.loh at oracle.com Sun Apr 6 05:19:17 2025 From: eugene.loh at oracle.com (eugene.loh at oracle.com) Date: Sun, 6 Apr 2025 01:19:17 -0400 Subject: [DTrace-devel] [PATCH] test: Get cpc expected branches and instructions counts from perf Message-ID: <20250406051917.29640-1-eugene.loh@oracle.com> From: Eugene Loh For a number of the cpc tests, we get expected counts from perf. For branches and instructions, however, we can determine the expected counts more directly since there is one branch and a fixed number of instructions per iteration. Thus, we can derive an expected cpc counts simply by knowing the number of iterations. For some compilers, however, there is apparently some loop unrolling even at low, default levels of optimization. So, revert to the perf count to estimate the expected cpc count even for the branches and instructions tests. Signed-off-by: Eugene Loh --- test/unittest/cpc/tst.branches.sh | 2 +- test/unittest/cpc/tst.instructions.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/test/unittest/cpc/tst.branches.sh b/test/unittest/cpc/tst.branches.sh index 87250c371..442d65332 100755 --- a/test/unittest/cpc/tst.branches.sh +++ b/test/unittest/cpc/tst.branches.sh @@ -53,7 +53,7 @@ fi actual=$(($period * `cat tmp.txt`)) # determine expected count (one branch per interation) -expect=$niters +expect=`$utils/perf_count_event.sh branches workload_user $niters` # check $utils/check_result.sh $actual $expect $(($expect / 4)) diff --git a/test/unittest/cpc/tst.instructions.sh b/test/unittest/cpc/tst.instructions.sh index a7fad3e78..34112c6e7 100755 --- a/test/unittest/cpc/tst.instructions.sh +++ b/test/unittest/cpc/tst.instructions.sh @@ -61,7 +61,7 @@ fi actual=$(($period * `cat tmp.txt`)) # determine expected count -expect=$(($niters * $ninstructions_per_iter)) +expect=`$utils/perf_count_event.sh instructions workload_user $niters` # check $utils/check_result.sh $actual $expect $(($expect / 4)) -- 2.43.5 From kris.van.hees at oracle.com Tue Apr 8 13:45:11 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 08 Apr 2025 09:45:11 -0400 Subject: [DTrace-devel] [PATCH] spec: add support for building on OL10 Message-ID: Building on OL10 requires a few adjustments. This patch also removes some dead portions of the spec file concerning the former translator generation mechanism. LTO is also disabled for building DTrace. Signed-off-by: Kris Van Hees --- dtrace.spec | 85 +++++++++++++++-------------------------------------- 1 file changed, 23 insertions(+), 62 deletions(-) diff --git a/dtrace.spec b/dtrace.spec index 902ad7d8..776432e0 100644 --- a/dtrace.spec +++ b/dtrace.spec @@ -13,43 +13,32 @@ # "--without libctf" to the rpmbuild command to bypass libctf. %define with_libctf %{?_without_libctf: 0} %{?!_without_libctf: 1} -# Kernel lists -# -# Translators are automatically generated by M4 macros from selected kernels. -# Only major.minor version impacts produced data so there is no need to add -# each specific kernel to the list. -# -# A list of kernels used during translator processing is in the dtrace_kernels -# macro. A selected kernel whose headers are used during compilation is in the -# build_kernel macro. -# -# You can override both from the rpmbuild command line if required. To build an -# RPM from locally-installed headers, define local_build on the command line. -# To build translators against locally-installed kernel headers in directories -# under /usr/src/kernels, define local_kernels on the command line (in addition -# to dtrace_kernels). - -%if "%{?dist}" == ".el9" -%{!?build_kernel: %define build_kernel 5.15.0-0.16.2%{?dist}uek} -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel}} -%endif -%if "%{?dist}" == ".el8" -%{!?build_kernel: %define build_kernel 5.4.17-2102.206.1%{?dist}uek} -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel} 5.15.0-0.16.2%{?dist}uek} -%endif %if "%{?dist}" == ".el7" -%{!?build_kernel: %define build_kernel 5.4.17-2018%{?dist}uek} -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel}} %define with_libctf 0 %endif -# ARM64 doesn't yet have a 32-bit glibc, so all support for 32-on-64 must be -# disabled. -%ifnarch aarch64 -%define glibc32 glibc-devel(%{__isa_name}-32) libgcc(%{__isa_name}-32) +# OL10 requires an explicit BPF toolset version. +%if "%{?dist}" == ".el10" +%define bpfv -14 +%define bpfc BPFC=bpf-unknown-none-gcc-14 %else +%define bpfv %{nil} +%define bpfc %{nil} +%endif + +# By default, do not build with 32-on-64 support. %define glibc32 %{nil} + +# Enable it for non-ARM64 builds excpet for OL10. +# ARM64 does not support 32-on-64 either. +%ifnarch aarch64 +%if "%{?dist}" != ".el10" +%define glibc32 glibc-devel(%{__isa_name}-32) libgcc(%{__isa_name}-32) %endif +%endif + +# Build DTrace without LTO. +%global _lto_cflags %{nil} BuildRequires: rpm Name: dtrace @@ -69,11 +58,7 @@ BuildRequires: fuse3-devel >= 3.2.0 %define maybe_use_fuse2 %{nil} %endif %{?systemd_requires} -BuildRequires: kernel%{variant}-devel = %{build_kernel} -%if "%{?dist}" == ".el8" -BuildRequires: kernel%{variant}-devel = 5.15.0-0.16.2%{?dist}uek -%endif -BuildRequires: gcc-bpf-unknown-none +BuildRequires: gcc-bpf-unknown-none%{bpfv} BuildRequires: binutils-bpf-unknown-none %ifnarch aarch64 Requires: binutils @@ -88,32 +73,11 @@ BuildRequires: libdtrace-ctf-devel >= 1.1.0 %endif Summary: DTrace user interface. Version: 2.0.2 -Release: 1%{?dist} +Release: 5%{?dist} Source: dtrace-%{version}.tar.bz2 BuildRoot: %{_tmppath}/%{name}-%{version}-build ExclusiveArch: x86_64 aarch64 -# Substitute in kernel-version-specific requirements. - -%{lua: - local srcdirexp = "" - dtrace_kernels = rpm.expand("%{dtrace_kernels}") - local_kernels = rpm.expand("%{local_kernels}") - for k in string.gmatch(dtrace_kernels, "[^ ]+") do - if local_kernels == "" then - print(rpm.expand("BuildRequires: kernel%{variant}-devel = " .. k .. "\n")) - end - srcdirexp = srcdirexp .. " " .. k .. "*" - end - rpm.define("srcdirexp " .. srcdirexp) -} - -# Expand kernel versions to full directory names -%global kerneldirs %(cd /usr/src/kernels; \ - for ver in %{srcdirexp}; do printf "%s " $ver; done) -%global bldkerneldir %(cd /usr/src/kernels; \ - for ver in %{build_kernel}*; do printf "%s" $ver; done) - %description DTrace user interface and dtrace(1) command. @@ -148,7 +112,7 @@ Summary: DTrace testsuite. Requires: make glibc-devel(%{__isa_name}-64) libgcc(%{__isa_name}-64) Requires: dtrace-headers >= 2.0.0 module-init-tools Requires: %{name}-devel = %{version}-%{release} perl gcc java -Requires: java-1.8.0-openjdk-devel perl-IO-Socket-IP xfsprogs +Requires: java-devel perl-IO-Socket-IP xfsprogs Requires: exportfs vim-minimal %{name}%{?_isa} = %{version}-%{release} Requires: coreutils wireshark %{glibc32} Requires: perf time bc nfs-utils @@ -168,8 +132,7 @@ it always tests the installed DTrace. %build make -j $(getconf _NPROCESSORS_ONLN) VERSION=%{version} \ - KERNELMODDIR=/usr/src/kernels KERNELSRCNAME= KERNELBLDNAME= \ - KERNELS="%{kerneldirs}" %{maybe_use_fuse2} + %{bpfc} %{maybe_use_fuse2} # Force off debuginfo splitting. We have no debuginfo in dtrace proper, # and the testsuite requires debuginfo for proper operation. @@ -183,8 +146,6 @@ make -j $(getconf _NPROCESSORS_ONLN) VERSION=%{version} \ mkdir -p $RPM_BUILD_ROOT/usr/sbin make DESTDIR=$RPM_BUILD_ROOT VERSION=%{version} \ - KERNELMODDIR=/usr/src/kernels KERNELSRCNAME= KERNELBLDNAME= \ - KERNELS="%{kerneldirs}" \ HDRPREFIX="$RPM_BUILD_ROOT/usr/include" \ install install-test -- 2.42.0 From eugene.loh at oracle.com Tue Apr 8 16:44:45 2025 From: eugene.loh at oracle.com (Eugene Loh) Date: Tue, 8 Apr 2025 12:44:45 -0400 Subject: [DTrace-devel] [PATCH] spec: add support for building on OL10 In-Reply-To: References: Message-ID: Reviewed-by: Eugene Loh though I don't really know what's going on here. A few questions/comments: Should the change in Java devel dependency be mentioned in the commit message (or be in its own little patch)? The comment "dead portions of the spec file concerning the former translator generation mechanism" makes me wonder why all that is in this patch rather than being in its own separate patch.? Also, does that change mean changes in other files as well, or is it really confined to the .spec file? And... On 4/8/25 09:45, Kris Van Hees via DTrace-devel wrote: > Building on OL10 requires a few adjustments. This patch also removes > some dead portions of the spec file concerning the former translator > generation mechanism. > > LTO is also disabled for building DTrace. > > Signed-off-by: Kris Van Hees > --- > dtrace.spec | 85 +++++++++++++++-------------------------------------- > 1 file changed, 23 insertions(+), 62 deletions(-) > > diff --git a/dtrace.spec b/dtrace.spec > index 902ad7d8..776432e0 100644 > --- a/dtrace.spec > +++ b/dtrace.spec > @@ -13,43 +13,32 @@ > # "--without libctf" to the rpmbuild command to bypass libctf. > %define with_libctf %{?_without_libctf: 0} %{?!_without_libctf: 1} > > -# Kernel lists > -# > -# Translators are automatically generated by M4 macros from selected kernels. > -# Only major.minor version impacts produced data so there is no need to add > -# each specific kernel to the list. > -# > -# A list of kernels used during translator processing is in the dtrace_kernels > -# macro. A selected kernel whose headers are used during compilation is in the > -# build_kernel macro. > -# > -# You can override both from the rpmbuild command line if required. To build an > -# RPM from locally-installed headers, define local_build on the command line. > -# To build translators against locally-installed kernel headers in directories > -# under /usr/src/kernels, define local_kernels on the command line (in addition > -# to dtrace_kernels). > - > -%if "%{?dist}" == ".el9" > -%{!?build_kernel: %define build_kernel 5.15.0-0.16.2%{?dist}uek} > -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel}} > -%endif > -%if "%{?dist}" == ".el8" > -%{!?build_kernel: %define build_kernel 5.4.17-2102.206.1%{?dist}uek} > -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel} 5.15.0-0.16.2%{?dist}uek} > -%endif > %if "%{?dist}" == ".el7" > -%{!?build_kernel: %define build_kernel 5.4.17-2018%{?dist}uek} > -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel}} > %define with_libctf 0 > %endif > > -# ARM64 doesn't yet have a 32-bit glibc, so all support for 32-on-64 must be > -# disabled. > -%ifnarch aarch64 > -%define glibc32 glibc-devel(%{__isa_name}-32) libgcc(%{__isa_name}-32) > +# OL10 requires an explicit BPF toolset version. > +%if "%{?dist}" == ".el10" > +%define bpfv -14 > +%define bpfc BPFC=bpf-unknown-none-gcc-14 > %else > +%define bpfv %{nil} > +%define bpfc %{nil} > +%endif > + > +# By default, do not build with 32-on-64 support. > %define glibc32 %{nil} > + > +# Enable it for non-ARM64 builds excpet for OL10. Typo: excpet. > +# ARM64 does not support 32-on-64 either. Comment seems unnecessary given the immediately preceding line. > +%ifnarch aarch64 > +%if "%{?dist}" != ".el10" > +%define glibc32 glibc-devel(%{__isa_name}-32) libgcc(%{__isa_name}-32) > %endif > +%endif > + > +# Build DTrace without LTO. > +%global _lto_cflags %{nil} > > BuildRequires: rpm > Name: dtrace > @@ -69,11 +58,7 @@ BuildRequires: fuse3-devel >= 3.2.0 > %define maybe_use_fuse2 %{nil} > %endif > %{?systemd_requires} > -BuildRequires: kernel%{variant}-devel = %{build_kernel} > -%if "%{?dist}" == ".el8" > -BuildRequires: kernel%{variant}-devel = 5.15.0-0.16.2%{?dist}uek > -%endif > -BuildRequires: gcc-bpf-unknown-none > +BuildRequires: gcc-bpf-unknown-none%{bpfv} > BuildRequires: binutils-bpf-unknown-none > %ifnarch aarch64 > Requires: binutils > @@ -88,32 +73,11 @@ BuildRequires: libdtrace-ctf-devel >= 1.1.0 > %endif > Summary: DTrace user interface. > Version: 2.0.2 > -Release: 1%{?dist} > +Release: 5%{?dist} > Source: dtrace-%{version}.tar.bz2 > BuildRoot: %{_tmppath}/%{name}-%{version}-build > ExclusiveArch: x86_64 aarch64 > > -# Substitute in kernel-version-specific requirements. > - > -%{lua: > - local srcdirexp = "" > - dtrace_kernels = rpm.expand("%{dtrace_kernels}") > - local_kernels = rpm.expand("%{local_kernels}") > - for k in string.gmatch(dtrace_kernels, "[^ ]+") do > - if local_kernels == "" then > - print(rpm.expand("BuildRequires: kernel%{variant}-devel = " .. k .. "\n")) > - end > - srcdirexp = srcdirexp .. " " .. k .. "*" > - end > - rpm.define("srcdirexp " .. srcdirexp) > -} > - > -# Expand kernel versions to full directory names > -%global kerneldirs %(cd /usr/src/kernels; \ > - for ver in %{srcdirexp}; do printf "%s " $ver; done) > -%global bldkerneldir %(cd /usr/src/kernels; \ > - for ver in %{build_kernel}*; do printf "%s" $ver; done) > - > %description > DTrace user interface and dtrace(1) command. > > @@ -148,7 +112,7 @@ Summary: DTrace testsuite. > Requires: make glibc-devel(%{__isa_name}-64) libgcc(%{__isa_name}-64) > Requires: dtrace-headers >= 2.0.0 module-init-tools > Requires: %{name}-devel = %{version}-%{release} perl gcc java > -Requires: java-1.8.0-openjdk-devel perl-IO-Socket-IP xfsprogs > +Requires: java-devel perl-IO-Socket-IP xfsprogs > Requires: exportfs vim-minimal %{name}%{?_isa} = %{version}-%{release} > Requires: coreutils wireshark %{glibc32} > Requires: perf time bc nfs-utils > @@ -168,8 +132,7 @@ it always tests the installed DTrace. > > %build > make -j $(getconf _NPROCESSORS_ONLN) VERSION=%{version} \ > - KERNELMODDIR=/usr/src/kernels KERNELSRCNAME= KERNELBLDNAME= \ > - KERNELS="%{kerneldirs}" %{maybe_use_fuse2} > + %{bpfc} %{maybe_use_fuse2} > > # Force off debuginfo splitting. We have no debuginfo in dtrace proper, > # and the testsuite requires debuginfo for proper operation. > @@ -183,8 +146,6 @@ make -j $(getconf _NPROCESSORS_ONLN) VERSION=%{version} \ > > mkdir -p $RPM_BUILD_ROOT/usr/sbin > make DESTDIR=$RPM_BUILD_ROOT VERSION=%{version} \ > - KERNELMODDIR=/usr/src/kernels KERNELSRCNAME= KERNELBLDNAME= \ > - KERNELS="%{kerneldirs}" \ > HDRPREFIX="$RPM_BUILD_ROOT/usr/include" \ > install install-test > From kris.van.hees at oracle.com Fri Apr 11 18:37:34 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Fri, 11 Apr 2025 14:37:34 -0400 Subject: [DTrace-devel] [PATCH] spec: add support for building on OL10 In-Reply-To: References: Message-ID: On Tue, Apr 08, 2025 at 12:44:45PM -0400, Eugene Loh via DTrace-devel wrote: > Reviewed-by: Eugene Loh > though I don't really know what's going on here. Thanks. > A few questions/comments: > > Should the change in Java devel dependency be mentioned in the commit > message (or be in its own little patch)? It could be its own patch but I can just mention it in the commit msg. > The comment "dead portions of the spec file concerning the former translator > generation mechanism" makes me wonder why all that is in this patch rather > than being in its own separate patch.? Also, does that change mean changes > in other files as well, or is it really confined to the .spec file? This is a resync of the spec file based on what we have used to do builds. Since the spec file is OL specific, we do not always use the same level of granularity of patches. This dead code removal has been pending for a while, but couldn't be merged until now. > And... > > On 4/8/25 09:45, Kris Van Hees via DTrace-devel wrote: > > Building on OL10 requires a few adjustments. This patch also removes > > some dead portions of the spec file concerning the former translator > > generation mechanism. > > > > LTO is also disabled for building DTrace. > > > > Signed-off-by: Kris Van Hees > > --- > > dtrace.spec | 85 +++++++++++++++-------------------------------------- > > 1 file changed, 23 insertions(+), 62 deletions(-) > > > > diff --git a/dtrace.spec b/dtrace.spec > > index 902ad7d8..776432e0 100644 > > --- a/dtrace.spec > > +++ b/dtrace.spec > > @@ -13,43 +13,32 @@ > > # "--without libctf" to the rpmbuild command to bypass libctf. > > %define with_libctf %{?_without_libctf: 0} %{?!_without_libctf: 1} > > -# Kernel lists > > -# > > -# Translators are automatically generated by M4 macros from selected kernels. > > -# Only major.minor version impacts produced data so there is no need to add > > -# each specific kernel to the list. > > -# > > -# A list of kernels used during translator processing is in the dtrace_kernels > > -# macro. A selected kernel whose headers are used during compilation is in the > > -# build_kernel macro. > > -# > > -# You can override both from the rpmbuild command line if required. To build an > > -# RPM from locally-installed headers, define local_build on the command line. > > -# To build translators against locally-installed kernel headers in directories > > -# under /usr/src/kernels, define local_kernels on the command line (in addition > > -# to dtrace_kernels). > > - > > -%if "%{?dist}" == ".el9" > > -%{!?build_kernel: %define build_kernel 5.15.0-0.16.2%{?dist}uek} > > -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel}} > > -%endif > > -%if "%{?dist}" == ".el8" > > -%{!?build_kernel: %define build_kernel 5.4.17-2102.206.1%{?dist}uek} > > -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel} 5.15.0-0.16.2%{?dist}uek} > > -%endif > > %if "%{?dist}" == ".el7" > > -%{!?build_kernel: %define build_kernel 5.4.17-2018%{?dist}uek} > > -%{!?dtrace_kernels: %define dtrace_kernels %{build_kernel}} > > %define with_libctf 0 > > %endif > > -# ARM64 doesn't yet have a 32-bit glibc, so all support for 32-on-64 must be > > -# disabled. > > -%ifnarch aarch64 > > -%define glibc32 glibc-devel(%{__isa_name}-32) libgcc(%{__isa_name}-32) > > +# OL10 requires an explicit BPF toolset version. > > +%if "%{?dist}" == ".el10" > > +%define bpfv -14 > > +%define bpfc BPFC=bpf-unknown-none-gcc-14 > > %else > > +%define bpfv %{nil} > > +%define bpfc %{nil} > > +%endif > > + > > +# By default, do not build with 32-on-64 support. > > %define glibc32 %{nil} > > + > > +# Enable it for non-ARM64 builds excpet for OL10. > > Typo: excpet. Thanks. > > +# ARM64 does not support 32-on-64 either. > > Comment seems unnecessary given the immediately preceding line. Will drop. > > +%ifnarch aarch64 > > +%if "%{?dist}" != ".el10" > > +%define glibc32 glibc-devel(%{__isa_name}-32) libgcc(%{__isa_name}-32) > > %endif > > +%endif > > + > > +# Build DTrace without LTO. > > +%global _lto_cflags %{nil} > > BuildRequires: rpm > > Name: dtrace > > @@ -69,11 +58,7 @@ BuildRequires: fuse3-devel >= 3.2.0 > > %define maybe_use_fuse2 %{nil} > > %endif > > %{?systemd_requires} > > -BuildRequires: kernel%{variant}-devel = %{build_kernel} > > -%if "%{?dist}" == ".el8" > > -BuildRequires: kernel%{variant}-devel = 5.15.0-0.16.2%{?dist}uek > > -%endif > > -BuildRequires: gcc-bpf-unknown-none > > +BuildRequires: gcc-bpf-unknown-none%{bpfv} > > BuildRequires: binutils-bpf-unknown-none > > %ifnarch aarch64 > > Requires: binutils > > @@ -88,32 +73,11 @@ BuildRequires: libdtrace-ctf-devel >= 1.1.0 > > %endif > > Summary: DTrace user interface. > > Version: 2.0.2 > > -Release: 1%{?dist} > > +Release: 5%{?dist} > > Source: dtrace-%{version}.tar.bz2 > > BuildRoot: %{_tmppath}/%{name}-%{version}-build > > ExclusiveArch: x86_64 aarch64 > > -# Substitute in kernel-version-specific requirements. > > - > > -%{lua: > > - local srcdirexp = "" > > - dtrace_kernels = rpm.expand("%{dtrace_kernels}") > > - local_kernels = rpm.expand("%{local_kernels}") > > - for k in string.gmatch(dtrace_kernels, "[^ ]+") do > > - if local_kernels == "" then > > - print(rpm.expand("BuildRequires: kernel%{variant}-devel = " .. k .. "\n")) > > - end > > - srcdirexp = srcdirexp .. " " .. k .. "*" > > - end > > - rpm.define("srcdirexp " .. srcdirexp) > > -} > > - > > -# Expand kernel versions to full directory names > > -%global kerneldirs %(cd /usr/src/kernels; \ > > - for ver in %{srcdirexp}; do printf "%s " $ver; done) > > -%global bldkerneldir %(cd /usr/src/kernels; \ > > - for ver in %{build_kernel}*; do printf "%s" $ver; done) > > - > > %description > > DTrace user interface and dtrace(1) command. > > @@ -148,7 +112,7 @@ Summary: DTrace testsuite. > > Requires: make glibc-devel(%{__isa_name}-64) libgcc(%{__isa_name}-64) > > Requires: dtrace-headers >= 2.0.0 module-init-tools > > Requires: %{name}-devel = %{version}-%{release} perl gcc java > > -Requires: java-1.8.0-openjdk-devel perl-IO-Socket-IP xfsprogs > > +Requires: java-devel perl-IO-Socket-IP xfsprogs > > Requires: exportfs vim-minimal %{name}%{?_isa} = %{version}-%{release} > > Requires: coreutils wireshark %{glibc32} > > Requires: perf time bc nfs-utils > > @@ -168,8 +132,7 @@ it always tests the installed DTrace. > > %build > > make -j $(getconf _NPROCESSORS_ONLN) VERSION=%{version} \ > > - KERNELMODDIR=/usr/src/kernels KERNELSRCNAME= KERNELBLDNAME= \ > > - KERNELS="%{kerneldirs}" %{maybe_use_fuse2} > > + %{bpfc} %{maybe_use_fuse2} > > # Force off debuginfo splitting. We have no debuginfo in dtrace proper, > > # and the testsuite requires debuginfo for proper operation. > > @@ -183,8 +146,6 @@ make -j $(getconf _NPROCESSORS_ONLN) VERSION=%{version} \ > > mkdir -p $RPM_BUILD_ROOT/usr/sbin > > make DESTDIR=$RPM_BUILD_ROOT VERSION=%{version} \ > > - KERNELMODDIR=/usr/src/kernels KERNELSRCNAME= KERNELBLDNAME= \ > > - KERNELS="%{kerneldirs}" \ > > HDRPREFIX="$RPM_BUILD_ROOT/usr/include" \ > > install install-test > > _______________________________________________ > DTrace-devel mailing list > DTrace-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/dtrace-devel From kris.van.hees at oracle.com Fri Apr 11 20:37:21 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Fri, 11 Apr 2025 16:37:21 -0400 Subject: [DTrace-devel] [PATCH] test: Expect USDT argmap to fail on ARM on older kernels In-Reply-To: References: <20250319063230.28171-1-eugene.loh@oracle.com> <20250319063230.28171-4-eugene.loh@oracle.com> Message-ID: On Wed, Mar 19, 2025 at 03:04:55PM -0400, Kris Van Hees wrote: > I am holding off on this patch for the moment, just to get a closer look at > the potential cause (and some info on how old the kernel needs to be for this > to fail). I would expect arg access to be the problem rather than the mapping > because the mapping is simply something that happens in BPF code. That should > not be kernel-dependent. Reviewed-by: Kris Van Hees ... because the affected kernels are old enough that it does not really matter much and I do not want to hold up a fix for this test always showing up as failing on those systems. We'll keep looking at the why in the meantime as well. > > On Wed, Mar 19, 2025 at 02:32:29AM -0400, eugene.loh--- via DTrace-devel wrote: > > From: Eugene Loh > > > > Signed-off-by: Eugene Loh > > --- > > test/unittest/usdt/skip_arm_uek6.x | 25 +++++++++++++++++++ > > .../usdt/tst.argmap-typed-partial.aarch64.x | 1 + > > test/unittest/usdt/tst.argmap-typed.aarch64.x | 1 + > > .../tst.multiprov-dupprobe-fire.aarch64.x | 1 + > > .../tst.multiprov-dupprobe-shlibs.aarch64.x | 1 + > > .../usdt/tst.multiprovider-fire.aarch64.x | 1 + > > 6 files changed, 30 insertions(+) > > create mode 100755 test/unittest/usdt/skip_arm_uek6.x > > create mode 120000 test/unittest/usdt/tst.argmap-typed-partial.aarch64.x > > create mode 120000 test/unittest/usdt/tst.argmap-typed.aarch64.x > > create mode 120000 test/unittest/usdt/tst.multiprov-dupprobe-fire.aarch64.x > > create mode 120000 test/unittest/usdt/tst.multiprov-dupprobe-shlibs.aarch64.x > > create mode 120000 test/unittest/usdt/tst.multiprovider-fire.aarch64.x > > > > diff --git a/test/unittest/usdt/skip_arm_uek6.x b/test/unittest/usdt/skip_arm_uek6.x > > new file mode 100755 > > index 000000000..252cbebb5 > > --- /dev/null > > +++ b/test/unittest/usdt/skip_arm_uek6.x > > @@ -0,0 +1,25 @@ > > +#!/bin/bash > > +# Licensed under the Universal Permissive License v 1.0 as shown at > > +# http://oss.oracle.com/licenses/upl. > > +# > > +# @@skip: not run directly by test harness > > +# > > +# Tests that depend on USDT argument translation fail on ARM for UEK6. > > +# They're fine for UEK7. It is unclear in exactly which kernel they > > +# start working. > > + > > +if [[ `uname -m` != "aarch64" ]]; then > > + exit 0 > > +fi > > + > > +read MAJOR MINOR <<< `uname -r | grep -Eo '^[0-9]+\.[0-9]+' | tr '.' ' '` > > + > > +if [ $MAJOR -gt 5 ]; then > > + exit 0 > > +fi > > +if [ $MAJOR -eq 5 -a $MINOR -ge 10 ]; then > > + exit 0 > > +fi > > + > > +echo "USDT argmap not working on ARM on older kernels" > > +exit 1 > > diff --git a/test/unittest/usdt/tst.argmap-typed-partial.aarch64.x b/test/unittest/usdt/tst.argmap-typed-partial.aarch64.x > > new file mode 120000 > > index 000000000..8d462f98f > > --- /dev/null > > +++ b/test/unittest/usdt/tst.argmap-typed-partial.aarch64.x > > @@ -0,0 +1 @@ > > +skip_arm_uek6.x > > \ No newline at end of file > > diff --git a/test/unittest/usdt/tst.argmap-typed.aarch64.x b/test/unittest/usdt/tst.argmap-typed.aarch64.x > > new file mode 120000 > > index 000000000..8d462f98f > > --- /dev/null > > +++ b/test/unittest/usdt/tst.argmap-typed.aarch64.x > > @@ -0,0 +1 @@ > > +skip_arm_uek6.x > > \ No newline at end of file > > diff --git a/test/unittest/usdt/tst.multiprov-dupprobe-fire.aarch64.x b/test/unittest/usdt/tst.multiprov-dupprobe-fire.aarch64.x > > new file mode 120000 > > index 000000000..8d462f98f > > --- /dev/null > > +++ b/test/unittest/usdt/tst.multiprov-dupprobe-fire.aarch64.x > > @@ -0,0 +1 @@ > > +skip_arm_uek6.x > > \ No newline at end of file > > diff --git a/test/unittest/usdt/tst.multiprov-dupprobe-shlibs.aarch64.x b/test/unittest/usdt/tst.multiprov-dupprobe-shlibs.aarch64.x > > new file mode 120000 > > index 000000000..8d462f98f > > --- /dev/null > > +++ b/test/unittest/usdt/tst.multiprov-dupprobe-shlibs.aarch64.x > > @@ -0,0 +1 @@ > > +skip_arm_uek6.x > > \ No newline at end of file > > diff --git a/test/unittest/usdt/tst.multiprovider-fire.aarch64.x b/test/unittest/usdt/tst.multiprovider-fire.aarch64.x > > new file mode 120000 > > index 000000000..8d462f98f > > --- /dev/null > > +++ b/test/unittest/usdt/tst.multiprovider-fire.aarch64.x > > @@ -0,0 +1 @@ > > +skip_arm_uek6.x > > \ No newline at end of file > > -- > > 2.43.5 > > > > > > _______________________________________________ > > DTrace-devel mailing list > > DTrace-devel at oss.oracle.com > > https://oss.oracle.com/mailman/listinfo/dtrace-devel From kris.van.hees at oracle.com Fri Apr 11 20:38:50 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Fri, 11 Apr 2025 16:38:50 -0400 Subject: [DTrace-devel] [PATCH 1/4] Remove orphaned dtrace_recdesc_t component dtrd_uarg In-Reply-To: <20250325222521.15224-1-eugene.loh@oracle.com> References: <20250325222521.15224-1-eugene.loh@oracle.com> Message-ID: On Tue, Mar 25, 2025 at 06:25:18PM -0400, eugene.loh at oracle.com wrote: > From: Eugene Loh > > Signed-off-by: Eugene Loh Reviewed-by: Kris Van Hees > --- > include/dtrace/metadesc.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/include/dtrace/metadesc.h b/include/dtrace/metadesc.h > index 8a4add255..b0f789932 100644 > --- a/include/dtrace/metadesc.h > +++ b/include/dtrace/metadesc.h > @@ -2,7 +2,7 @@ > * Licensed under the Universal Permissive License v 1.0 as shown at > * http://oss.oracle.com/licenses/upl. > * > - * Copyright (c) 2009, 2024, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2009, 2025, Oracle and/or its affiliates. All rights reserved. > */ > > /* > @@ -39,7 +39,6 @@ typedef struct dtrace_recdesc { > uint16_t dtrd_alignment; /* required alignment */ > void *dtrd_format; /* format, if any */ > uint64_t dtrd_arg; /* action argument */ > - uint64_t dtrd_uarg; /* user argument */ > } dtrace_recdesc_t; > > typedef struct dtrace_datadesc { > -- > 2.43.5 > From kris.van.hees at oracle.com Fri Apr 11 20:48:11 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Fri, 11 Apr 2025 16:48:11 -0400 Subject: [DTrace-devel] [PATCH v3 2/2] Clean up sched provider trampoline FIXMEs In-Reply-To: <20250403050252.15239-1-eugene.loh@oracle.com> References: <20250403050252.15239-1-eugene.loh@oracle.com> Message-ID: Partial comments below (still looking at the provider changes)... On Thu, Apr 03, 2025 at 01:02:52AM -0400, eugene.loh at oracle.com wrote: > From: Eugene Loh > > The sched provider trampoline for enqueue and dequeue probes had > pending FIXMEs for providing a cpuinfo_t* for the cpu associated > with the run queue. Implement the missing code. > > Since the cpu associated with the run queue might be different from > the cpu where we are running, it becomes necessary to access the > cpuinfo for some random cpu. With Linux 5.18, there is a BPF > helper function map_lookup_percpu_elem() that allows such lookups > on per-cpu arrays. To support older kernels, however, we change > the cpuinfo BPF map from per-cpu to global. Also, it is a hash > table rather than an array in case cpus are not numbered consecutively. I agree with all the above. Good solution. > > Signed-off-by: Eugene Loh > --- > bpf/get_agg.c | 2 +- > bpf/get_bvar.c | 2 +- > libdtrace/dt_bpf.c | 34 ++++++-------- > libdtrace/dt_cg.c | 5 ++- > libdtrace/dt_prov_lockstat.c | 4 +- > libdtrace/dt_prov_sched.c | 74 +++++++++++++++++++++++++------ > libdtrace/dt_work.c | 20 +++------ > test/unittest/sched/tst.enqueue.d | 1 - > 8 files changed, 89 insertions(+), 53 deletions(-) > > diff --git a/bpf/get_agg.c b/bpf/get_agg.c > index c0eb825f0..e70caa6ef 100644 > --- a/bpf/get_agg.c > +++ b/bpf/get_agg.c > @@ -21,7 +21,7 @@ extern struct bpf_map_def cpuinfo; > */ > noinline uint64_t *dt_no_agg(void) > { > - uint32_t key = 0; > + uint32_t key = bpf_get_smp_processor_id(); > dt_bpf_cpuinfo_t *ci; > > ci = bpf_map_lookup_elem(&cpuinfo, &key); > diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c > index d372b3445..d81c3605f 100644 > --- a/bpf/get_bvar.c > +++ b/bpf/get_bvar.c > @@ -67,7 +67,7 @@ noinline uint64_t dt_bvar_caller(const dt_dctx_t *dctx) > > noinline uint64_t dt_bvar_curcpu(const dt_dctx_t *dctx) > { > - uint32_t key = 0; > + uint32_t key = bpf_get_smp_processor_id(); > void *val = bpf_map_lookup_elem(&cpuinfo, &key); > > if (val == NULL) { > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c > index 6d42a96c7..d6722cbd1 100644 > --- a/libdtrace/dt_bpf.c > +++ b/libdtrace/dt_bpf.c > @@ -761,37 +761,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp) > static int > gmap_create_cpuinfo(dtrace_hdl_t *dtp) > { > - int i, rc; > + int i; > uint32_t key = 0; > dtrace_conf_t *conf = &dtp->dt_conf; > size_t ncpus = conf->num_online_cpus; > - dt_bpf_cpuinfo_t *data; > + dt_bpf_cpuinfo_t data; Not sure about this, because (see below)... > cpuinfo_t *ci; > > - /* > - * num_possible_cpus <= num_online_cpus: see dt_conf_init. > - */ > - data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > - sizeof(dt_bpf_cpuinfo_t)); > - if (data == NULL) > - return dt_set_errno(dtp, EDT_NOMEM); > - > - for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) > - memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t)); > - > dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo", > - BPF_MAP_TYPE_PERCPU_ARRAY, > + BPF_MAP_TYPE_HASH, > sizeof(uint32_t), > - sizeof(dt_bpf_cpuinfo_t), 1); > + sizeof(dt_bpf_cpuinfo_t), ncpus); > if (dtp->dt_cpumap_fd == -1) > return -1; > > - rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); > - dt_free(dtp, data); > - if (rc == -1) > - return dt_bpf_error(dtp, > - "cannot update BPF map 'cpuinfo': %s\n", > - strerror(errno)); > + memset(&data, 0, sizeof(data)); Do we need this, because (see below).... > + for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) { > + memcpy(&data.ci, ci, sizeof(cpuinfo_t)); Do we need this, because (see below).... > + key = ci->cpu_id; > + if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, &data) == -1) Why can'you we simply do: if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, ci) == -1) > + return dt_bpf_error(dtp, > + "cannot update BPF map 'cpuinfo': %s\n", > + strerror(errno)); > + } > > return 0; > } > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c > index 6dcf4cd3d..d83b1c2ce 100644 > --- a/libdtrace/dt_cg.c > +++ b/libdtrace/dt_cg.c > @@ -1243,9 +1243,12 @@ dt_cg_epilogue(dt_pcb_t *pcb) > } else { > idp = dt_dlib_get_map(dtp, "cpuinfo"); > assert(idp != NULL); > + > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > + > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP)); > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl)); > emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1)); > diff --git a/libdtrace/dt_prov_lockstat.c b/libdtrace/dt_prov_lockstat.c > index c73edf9be..8b2cf4da2 100644 > --- a/libdtrace/dt_prov_lockstat.c > +++ b/libdtrace/dt_prov_lockstat.c > @@ -121,11 +121,13 @@ static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > { > dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > + > assert(idp != NULL); > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_BASE)); > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > emit(dlp, BPF_MOV_REG(BPF_REG_6, BPF_REG_0)); > diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c > index 3a218f3cb..a548e679f 100644 > --- a/libdtrace/dt_prov_sched.c > +++ b/libdtrace/dt_prov_sched.c > @@ -84,6 +84,40 @@ static int populate(dtrace_hdl_t *dtp) > probe_args, probes); > } > > +/* > + * Get a pointer to the cpuinfo_t structure for the CPU associated > + * with the runqueue that is in arg0. > + * > + * Clobbers %r1 through %r5 > + * Stores pointer to cpuinfo_t struct in %r0 > + */ > +static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > +{ > + dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > + > + assert(idp != NULL); > + > + /* Put the runqueue pointer from mst->arg0 into %r3. */ > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0))); > + > + /* Turn it into a pointer to its cpu member. */ > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, dt_cg_ctf_offsetof("struct rq", "cpu", NULL, 1))); > + > + /* Call bpf_probe_read_kernel(%fp + DT_TRAMP_SP_SLOT[0], sizeof(int), %r3) */ > + emit(dlp, BPF_MOV_IMM(BPF_REG_2, (int) sizeof(int))); > + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP)); > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0))); > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_kernel)); > + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, exitlbl)); > + > + /* Now look up the corresponding cpuinfo_t. */ > + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0))); > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > +} > + > /* > * Generate a BPF trampoline for a SDT probe. > * > @@ -98,18 +132,39 @@ static int populate(dtrace_hdl_t *dtp) > */ > static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > { > + dtrace_hdl_t *dtp = pcb->pcb_hdl; > dt_irlist_t *dlp = &pcb->pcb_ir; > dt_probe_t *prp = pcb->pcb_probe; > > if (strcmp(prp->desc->prb, "dequeue") == 0) { > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > /* > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > - * associated with the runqueue. > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > + */ > + get_cpuinfo(dtp, dlp, exitlbl); > + > + /* > + * Copy arg1 into arg0. > */ > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > + > + /* Store the cpuinfo_t* in %r0 into arg1. */ > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > } else if (strcmp(prp->desc->prb, "enqueue") == 0) { > + /* > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > + */ > + get_cpuinfo(dtp, dlp, exitlbl); > + > + /* > + * Copy arg1 into arg0. > + */ > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > + > + /* Store the cpuinfo_t* in %r0 into arg1. */ > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > + > /* > * This is ugly but necessary... enqueue_task() takes a flags argument and the > * ENQUEUE_HEAD flag is used to indicate that the task is to be placed at the > @@ -120,15 +175,6 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > * outside the kernel source tree. > */ > #define ENQUEUE_HEAD 0x10 > - > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > - /* > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > - * associated with the runqueue. > - */ > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > - > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2))); > emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, ENQUEUE_HEAD)); > emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0)); > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c > index 498d5332a..2167ed299 100644 > --- a/libdtrace/dt_work.c > +++ b/libdtrace/dt_work.c > @@ -37,35 +37,29 @@ END_probe(void) > int > dt_check_cpudrops(dtrace_hdl_t *dtp, processorid_t cpu, dtrace_dropkind_t what) > { > - dt_bpf_cpuinfo_t *ci; > - uint32_t cikey = 0; > + dt_bpf_cpuinfo_t ci; > + uint32_t cikey = cpu; > uint64_t cnt; > int rval = 0; > > assert(what == DTRACEDROP_PRINCIPAL || what == DTRACEDROP_AGGREGATION); > > - ci = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > - sizeof(dt_bpf_cpuinfo_t)); > - if (ci == NULL) > - return dt_set_errno(dtp, EDT_NOMEM); > - > - if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, ci) == -1) { > + if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, &ci) == -1) { > rval = dt_set_errno(dtp, EDT_BPF); > goto fail; > } > > if (what == DTRACEDROP_PRINCIPAL) { > - cnt = ci[cpu].buf_drops - dtp->dt_drops[cpu].buf; > - dtp->dt_drops[cpu].buf = ci[cpu].buf_drops; > + cnt = ci.buf_drops - dtp->dt_drops[cpu].buf; > + dtp->dt_drops[cpu].buf = ci.buf_drops; > } else { > - cnt = ci[cpu].agg_drops - dtp->dt_drops[cpu].agg; > - dtp->dt_drops[cpu].agg = ci[cpu].agg_drops; > + cnt = ci.agg_drops - dtp->dt_drops[cpu].agg; > + dtp->dt_drops[cpu].agg = ci.agg_drops; > } > > rval = dt_handle_cpudrop(dtp, cpu, what, cnt); > > fail: > - dt_free(dtp, ci); > return rval; > } > > diff --git a/test/unittest/sched/tst.enqueue.d b/test/unittest/sched/tst.enqueue.d > index f445ac843..28dcace8c 100644 > --- a/test/unittest/sched/tst.enqueue.d > +++ b/test/unittest/sched/tst.enqueue.d > @@ -4,7 +4,6 @@ > * Licensed under the Universal Permissive License v 1.0 as shown at > * http://oss.oracle.com/licenses/upl. > */ > -/* @@xfail: dtv2 */ > > #pragma D option switchrate=100hz > #pragma D option destructive > -- > 2.43.5 > From eugene.loh at oracle.com Fri Apr 11 21:20:00 2025 From: eugene.loh at oracle.com (Eugene Loh) Date: Fri, 11 Apr 2025 17:20:00 -0400 Subject: [DTrace-devel] [PATCH v3 2/2] Clean up sched provider trampoline FIXMEs In-Reply-To: References: <20250403050252.15239-1-eugene.loh@oracle.com> Message-ID: On 4/11/25 16:48, Kris Van Hees wrote: > Partial comments below (still looking at the provider changes)... > > On Thu, Apr 03, 2025 at 01:02:52AM -0400, eugene.loh at oracle.com wrote: >> From: Eugene Loh >> >> The sched provider trampoline for enqueue and dequeue probes had >> pending FIXMEs for providing a cpuinfo_t* for the cpu associated >> with the run queue. Implement the missing code. >> >> Since the cpu associated with the run queue might be different from >> the cpu where we are running, it becomes necessary to access the >> cpuinfo for some random cpu. With Linux 5.18, there is a BPF >> helper function map_lookup_percpu_elem() that allows such lookups >> on per-cpu arrays. To support older kernels, however, we change >> the cpuinfo BPF map from per-cpu to global. Also, it is a hash >> table rather than an array in case cpus are not numbered consecutively. > I agree with all the above. Good solution. > >> Signed-off-by: Eugene Loh >> --- >> bpf/get_agg.c | 2 +- >> bpf/get_bvar.c | 2 +- >> libdtrace/dt_bpf.c | 34 ++++++-------- >> libdtrace/dt_cg.c | 5 ++- >> libdtrace/dt_prov_lockstat.c | 4 +- >> libdtrace/dt_prov_sched.c | 74 +++++++++++++++++++++++++------ >> libdtrace/dt_work.c | 20 +++------ >> test/unittest/sched/tst.enqueue.d | 1 - >> 8 files changed, 89 insertions(+), 53 deletions(-) >> >> diff --git a/bpf/get_agg.c b/bpf/get_agg.c >> index c0eb825f0..e70caa6ef 100644 >> --- a/bpf/get_agg.c >> +++ b/bpf/get_agg.c >> @@ -21,7 +21,7 @@ extern struct bpf_map_def cpuinfo; >> */ >> noinline uint64_t *dt_no_agg(void) >> { >> - uint32_t key = 0; >> + uint32_t key = bpf_get_smp_processor_id(); >> dt_bpf_cpuinfo_t *ci; >> >> ci = bpf_map_lookup_elem(&cpuinfo, &key); >> diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c >> index d372b3445..d81c3605f 100644 >> --- a/bpf/get_bvar.c >> +++ b/bpf/get_bvar.c >> @@ -67,7 +67,7 @@ noinline uint64_t dt_bvar_caller(const dt_dctx_t *dctx) >> >> noinline uint64_t dt_bvar_curcpu(const dt_dctx_t *dctx) >> { >> - uint32_t key = 0; >> + uint32_t key = bpf_get_smp_processor_id(); >> void *val = bpf_map_lookup_elem(&cpuinfo, &key); >> >> if (val == NULL) { >> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c >> index 6d42a96c7..d6722cbd1 100644 >> --- a/libdtrace/dt_bpf.c >> +++ b/libdtrace/dt_bpf.c >> @@ -761,37 +761,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp) >> static int >> gmap_create_cpuinfo(dtrace_hdl_t *dtp) >> { >> - int i, rc; >> + int i; >> uint32_t key = 0; >> dtrace_conf_t *conf = &dtp->dt_conf; >> size_t ncpus = conf->num_online_cpus; >> - dt_bpf_cpuinfo_t *data; >> + dt_bpf_cpuinfo_t data; > Not sure about this, because (see below)... > >> cpuinfo_t *ci; >> >> - /* >> - * num_possible_cpus <= num_online_cpus: see dt_conf_init. >> - */ >> - data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, >> - sizeof(dt_bpf_cpuinfo_t)); >> - if (data == NULL) >> - return dt_set_errno(dtp, EDT_NOMEM); >> - >> - for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) >> - memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t)); >> - >> dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo", >> - BPF_MAP_TYPE_PERCPU_ARRAY, >> + BPF_MAP_TYPE_HASH, >> sizeof(uint32_t), >> - sizeof(dt_bpf_cpuinfo_t), 1); >> + sizeof(dt_bpf_cpuinfo_t), ncpus); >> if (dtp->dt_cpumap_fd == -1) >> return -1; >> >> - rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); >> - dt_free(dtp, data); >> - if (rc == -1) >> - return dt_bpf_error(dtp, >> - "cannot update BPF map 'cpuinfo': %s\n", >> - strerror(errno)); >> + memset(&data, 0, sizeof(data)); > Do we need this, because (see below).... > >> + for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) { >> + memcpy(&data.ci, ci, sizeof(cpuinfo_t)); > Do we need this, because (see below).... > >> + key = ci->cpu_id; >> + if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, &data) == -1) > Why can'you we simply do: > > if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, ci) == -1) I think the problem is that the BPF map has elements with size sizeof(dt_bpf_cpuinfo_t).? Meanwhile, ci has size sizeof(cpuinfo_t), which is smaller.? So if we do an update like that, the map will have stuff where we want it to be initialized to 0. >> + return dt_bpf_error(dtp, >> + "cannot update BPF map 'cpuinfo': %s\n", >> + strerror(errno)); >> + } >> >> return 0; >> } >> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c >> index 6dcf4cd3d..d83b1c2ce 100644 >> --- a/libdtrace/dt_cg.c >> +++ b/libdtrace/dt_cg.c >> @@ -1243,9 +1243,12 @@ dt_cg_epilogue(dt_pcb_t *pcb) >> } else { >> idp = dt_dlib_get_map(dtp, "cpuinfo"); >> assert(idp != NULL); >> + >> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); >> + >> dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); >> emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP)); >> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); >> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); >> emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); >> emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl)); >> emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1)); >> diff --git a/libdtrace/dt_prov_lockstat.c b/libdtrace/dt_prov_lockstat.c >> index c73edf9be..8b2cf4da2 100644 >> --- a/libdtrace/dt_prov_lockstat.c >> +++ b/libdtrace/dt_prov_lockstat.c >> @@ -121,11 +121,13 @@ static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) >> { >> dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); >> >> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); >> + >> assert(idp != NULL); >> dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); >> emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); >> emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_BASE)); >> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); >> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); >> emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); >> emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); >> emit(dlp, BPF_MOV_REG(BPF_REG_6, BPF_REG_0)); >> diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c >> index 3a218f3cb..a548e679f 100644 >> --- a/libdtrace/dt_prov_sched.c >> +++ b/libdtrace/dt_prov_sched.c >> @@ -84,6 +84,40 @@ static int populate(dtrace_hdl_t *dtp) >> probe_args, probes); >> } >> >> +/* >> + * Get a pointer to the cpuinfo_t structure for the CPU associated >> + * with the runqueue that is in arg0. >> + * >> + * Clobbers %r1 through %r5 >> + * Stores pointer to cpuinfo_t struct in %r0 >> + */ >> +static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) >> +{ >> + dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); >> + >> + assert(idp != NULL); >> + >> + /* Put the runqueue pointer from mst->arg0 into %r3. */ >> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0))); >> + >> + /* Turn it into a pointer to its cpu member. */ >> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, dt_cg_ctf_offsetof("struct rq", "cpu", NULL, 1))); >> + >> + /* Call bpf_probe_read_kernel(%fp + DT_TRAMP_SP_SLOT[0], sizeof(int), %r3) */ >> + emit(dlp, BPF_MOV_IMM(BPF_REG_2, (int) sizeof(int))); >> + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP)); >> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0))); >> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_kernel)); >> + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, exitlbl)); >> + >> + /* Now look up the corresponding cpuinfo_t. */ >> + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); >> + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); >> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0))); >> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); >> + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); >> +} >> + >> /* >> * Generate a BPF trampoline for a SDT probe. >> * >> @@ -98,18 +132,39 @@ static int populate(dtrace_hdl_t *dtp) >> */ >> static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) >> { >> + dtrace_hdl_t *dtp = pcb->pcb_hdl; >> dt_irlist_t *dlp = &pcb->pcb_ir; >> dt_probe_t *prp = pcb->pcb_probe; >> >> if (strcmp(prp->desc->prb, "dequeue") == 0) { >> - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); >> - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); >> /* >> - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU >> - * associated with the runqueue. >> + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. >> + */ >> + get_cpuinfo(dtp, dlp, exitlbl); >> + >> + /* >> + * Copy arg1 into arg0. >> */ >> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); >> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); >> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); >> + >> + /* Store the cpuinfo_t* in %r0 into arg1. */ >> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); >> } else if (strcmp(prp->desc->prb, "enqueue") == 0) { >> + /* >> + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. >> + */ >> + get_cpuinfo(dtp, dlp, exitlbl); >> + >> + /* >> + * Copy arg1 into arg0. >> + */ >> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); >> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); >> + >> + /* Store the cpuinfo_t* in %r0 into arg1. */ >> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); >> + >> /* >> * This is ugly but necessary... enqueue_task() takes a flags argument and the >> * ENQUEUE_HEAD flag is used to indicate that the task is to be placed at the >> @@ -120,15 +175,6 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) >> * outside the kernel source tree. >> */ >> #define ENQUEUE_HEAD 0x10 >> - >> - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); >> - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); >> - /* >> - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU >> - * associated with the runqueue. >> - */ >> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); >> - >> emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2))); >> emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, ENQUEUE_HEAD)); >> emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0)); >> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c >> index 498d5332a..2167ed299 100644 >> --- a/libdtrace/dt_work.c >> +++ b/libdtrace/dt_work.c >> @@ -37,35 +37,29 @@ END_probe(void) >> int >> dt_check_cpudrops(dtrace_hdl_t *dtp, processorid_t cpu, dtrace_dropkind_t what) >> { >> - dt_bpf_cpuinfo_t *ci; >> - uint32_t cikey = 0; >> + dt_bpf_cpuinfo_t ci; >> + uint32_t cikey = cpu; >> uint64_t cnt; >> int rval = 0; >> >> assert(what == DTRACEDROP_PRINCIPAL || what == DTRACEDROP_AGGREGATION); >> >> - ci = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, >> - sizeof(dt_bpf_cpuinfo_t)); >> - if (ci == NULL) >> - return dt_set_errno(dtp, EDT_NOMEM); >> - >> - if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, ci) == -1) { >> + if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, &ci) == -1) { >> rval = dt_set_errno(dtp, EDT_BPF); >> goto fail; >> } >> >> if (what == DTRACEDROP_PRINCIPAL) { >> - cnt = ci[cpu].buf_drops - dtp->dt_drops[cpu].buf; >> - dtp->dt_drops[cpu].buf = ci[cpu].buf_drops; >> + cnt = ci.buf_drops - dtp->dt_drops[cpu].buf; >> + dtp->dt_drops[cpu].buf = ci.buf_drops; >> } else { >> - cnt = ci[cpu].agg_drops - dtp->dt_drops[cpu].agg; >> - dtp->dt_drops[cpu].agg = ci[cpu].agg_drops; >> + cnt = ci.agg_drops - dtp->dt_drops[cpu].agg; >> + dtp->dt_drops[cpu].agg = ci.agg_drops; >> } >> >> rval = dt_handle_cpudrop(dtp, cpu, what, cnt); >> >> fail: >> - dt_free(dtp, ci); >> return rval; >> } >> >> diff --git a/test/unittest/sched/tst.enqueue.d b/test/unittest/sched/tst.enqueue.d >> index f445ac843..28dcace8c 100644 >> --- a/test/unittest/sched/tst.enqueue.d >> +++ b/test/unittest/sched/tst.enqueue.d >> @@ -4,7 +4,6 @@ >> * Licensed under the Universal Permissive License v 1.0 as shown at >> * http://oss.oracle.com/licenses/upl. >> */ >> -/* @@xfail: dtv2 */ >> >> #pragma D option switchrate=100hz >> #pragma D option destructive >> -- >> 2.43.5 >> From eugene.loh at oracle.com Mon Apr 14 20:19:58 2025 From: eugene.loh at oracle.com (eugene.loh at oracle.com) Date: Mon, 14 Apr 2025 16:19:58 -0400 Subject: [DTrace-devel] [PATCH 1/2] test: Remove orphaned tst.lockstat.r Message-ID: <20250414201959.31327-1-eugene.loh@oracle.com> From: Eugene Loh In commit ded09d05a ("test: rework main lockstat test"), a lockstat test was renamed. Its .r results file was thereby orphaned. Remove the orphaned copy. Signed-off-by: Eugene Loh --- test/unittest/lockstat/tst.lockstat.r | 13 ------------- 1 file changed, 13 deletions(-) delete mode 100644 test/unittest/lockstat/tst.lockstat.r diff --git a/test/unittest/lockstat/tst.lockstat.r b/test/unittest/lockstat/tst.lockstat.r deleted file mode 100644 index 5bdc40c67..000000000 --- a/test/unittest/lockstat/tst.lockstat.r +++ /dev/null @@ -1,13 +0,0 @@ -Minimum lockstat events seen - -lockstat:::adaptive-spin - yes -lockstat:::adaptive-block - yes -lockstat:::adaptive-acquire - yes -lockstat:::adaptive-release - yes -lockstat:::rw-spin - yes -lockstat:::rw-acquire - yes -lockstat:::rw-release - yes -lockstat:::spin-spin - yes -lockstat:::spin-acquire - yes -lockstat:::spin-release - yes - -- 2.43.5 From eugene.loh at oracle.com Mon Apr 14 20:19:59 2025 From: eugene.loh at oracle.com (eugene.loh at oracle.com) Date: Mon, 14 Apr 2025 16:19:59 -0400 Subject: [DTrace-devel] [PATCH 2/2] test: Skip pid-0 tests on oversubscribed systems In-Reply-To: <20250414201959.31327-1-eugene.loh@oracle.com> References: <20250414201959.31327-1-eugene.loh@oracle.com> Message-ID: <20250414201959.31327-2-eugene.loh@oracle.com> From: Eugene Loh A number of tests check "tick-n /pid==0/" probes. The problem with this is that a tick-n probe runs on a specific CPU. If that CPU is fully subscribed, then pid 0 (swapper) will not run. Thus, the test will take a long time, only to time out. Change these tests to use profile-n instead of tick-n probes, improving chances that the test probe will fire on a less subscribed CPU. Therefore, also change the .r.p post-processing file so that it uses only one output line (in case two CPUs manage to write output). Finally, add skip files in case pid 0 does not fire on any CPU. Signed-off-by: Eugene Loh --- test/unittest/ustack/tst.kthread.d | 4 ++-- test/unittest/ustack/tst.kthread.x | 5 +++++ test/unittest/ustack/tst.uaddr-pid0.d | 4 ++-- test/unittest/ustack/tst.uaddr-pid0.r.p | 4 ++-- test/unittest/ustack/tst.uaddr-pid0.x | 5 +++++ test/unittest/ustack/tst.ufunc-pid0.d | 4 ++-- test/unittest/ustack/tst.ufunc-pid0.r.p | 4 ++-- test/unittest/ustack/tst.ufunc-pid0.x | 5 +++++ test/unittest/ustack/tst.usym-pid0.d | 4 ++-- test/unittest/ustack/tst.usym-pid0.r.p | 4 ++-- test/unittest/ustack/tst.usym-pid0.x | 5 +++++ 11 files changed, 34 insertions(+), 14 deletions(-) create mode 100755 test/unittest/ustack/tst.kthread.x create mode 100755 test/unittest/ustack/tst.uaddr-pid0.x create mode 100755 test/unittest/ustack/tst.ufunc-pid0.x create mode 100755 test/unittest/ustack/tst.usym-pid0.x diff --git a/test/unittest/ustack/tst.kthread.d b/test/unittest/ustack/tst.kthread.d index c6252b742..83ae6f7c6 100644 --- a/test/unittest/ustack/tst.kthread.d +++ b/test/unittest/ustack/tst.kthread.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2013, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -16,4 +16,4 @@ #pragma D option quiet -tick-100msec / pid == 0 / { ustack(); exit(0); } +profile-100msec / pid == 0 / { ustack(); exit(0); } diff --git a/test/unittest/ustack/tst.kthread.x b/test/unittest/ustack/tst.kthread.x new file mode 100755 index 000000000..b5fe7177a --- /dev/null +++ b/test/unittest/ustack/tst.kthread.x @@ -0,0 +1,5 @@ +#!/bin/sh + +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' +exit $? diff --git a/test/unittest/ustack/tst.uaddr-pid0.d b/test/unittest/ustack/tst.uaddr-pid0.d index 263a7ca94..ab54eea40 100644 --- a/test/unittest/ustack/tst.uaddr-pid0.d +++ b/test/unittest/ustack/tst.uaddr-pid0.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -9,7 +9,7 @@ #pragma D option quiet -tick-1 +profile-1 /pid == $target/ { uaddr(ucaller); diff --git a/test/unittest/ustack/tst.uaddr-pid0.r.p b/test/unittest/ustack/tst.uaddr-pid0.r.p index 9203dc824..78ab8e59d 100755 --- a/test/unittest/ustack/tst.uaddr-pid0.r.p +++ b/test/unittest/ustack/tst.uaddr-pid0.r.p @@ -1,4 +1,4 @@ #!/usr/bin/gawk -f -# remove trailing blanks -{ sub(" *$", ""); print } +# remove trailing blanks, use only one line +{ sub(" *$", ""); print; exit } diff --git a/test/unittest/ustack/tst.uaddr-pid0.x b/test/unittest/ustack/tst.uaddr-pid0.x new file mode 100755 index 000000000..b5fe7177a --- /dev/null +++ b/test/unittest/ustack/tst.uaddr-pid0.x @@ -0,0 +1,5 @@ +#!/bin/sh + +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' +exit $? diff --git a/test/unittest/ustack/tst.ufunc-pid0.d b/test/unittest/ustack/tst.ufunc-pid0.d index f076782aa..cd34275f1 100644 --- a/test/unittest/ustack/tst.ufunc-pid0.d +++ b/test/unittest/ustack/tst.ufunc-pid0.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -9,7 +9,7 @@ #pragma D option quiet -tick-1 +profile-1 /pid == $target/ { ufunc(ucaller); diff --git a/test/unittest/ustack/tst.ufunc-pid0.r.p b/test/unittest/ustack/tst.ufunc-pid0.r.p index 9203dc824..78ab8e59d 100755 --- a/test/unittest/ustack/tst.ufunc-pid0.r.p +++ b/test/unittest/ustack/tst.ufunc-pid0.r.p @@ -1,4 +1,4 @@ #!/usr/bin/gawk -f -# remove trailing blanks -{ sub(" *$", ""); print } +# remove trailing blanks, use only one line +{ sub(" *$", ""); print; exit } diff --git a/test/unittest/ustack/tst.ufunc-pid0.x b/test/unittest/ustack/tst.ufunc-pid0.x new file mode 100755 index 000000000..b5fe7177a --- /dev/null +++ b/test/unittest/ustack/tst.ufunc-pid0.x @@ -0,0 +1,5 @@ +#!/bin/sh + +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' +exit $? diff --git a/test/unittest/ustack/tst.usym-pid0.d b/test/unittest/ustack/tst.usym-pid0.d index d2f5ec5de..9aceab355 100644 --- a/test/unittest/ustack/tst.usym-pid0.d +++ b/test/unittest/ustack/tst.usym-pid0.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -9,7 +9,7 @@ #pragma D option quiet -tick-1 +profile-1 /pid == $target/ { usym(ucaller); diff --git a/test/unittest/ustack/tst.usym-pid0.r.p b/test/unittest/ustack/tst.usym-pid0.r.p index 9203dc824..78ab8e59d 100755 --- a/test/unittest/ustack/tst.usym-pid0.r.p +++ b/test/unittest/ustack/tst.usym-pid0.r.p @@ -1,4 +1,4 @@ #!/usr/bin/gawk -f -# remove trailing blanks -{ sub(" *$", ""); print } +# remove trailing blanks, use only one line +{ sub(" *$", ""); print; exit } diff --git a/test/unittest/ustack/tst.usym-pid0.x b/test/unittest/ustack/tst.usym-pid0.x new file mode 100755 index 000000000..b5fe7177a --- /dev/null +++ b/test/unittest/ustack/tst.usym-pid0.x @@ -0,0 +1,5 @@ +#!/bin/sh + +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' +exit $? -- 2.43.5 From eugene.loh at oracle.com Mon Apr 14 22:40:57 2025 From: eugene.loh at oracle.com (eugene.loh at oracle.com) Date: Mon, 14 Apr 2025 18:40:57 -0400 Subject: [DTrace-devel] [PATCH] test: Test fds[] member fi_fs Message-ID: <20250414224057.3287-1-eugene.loh@oracle.com> From: Eugene Loh Signed-off-by: Eugene Loh --- test/unittest/io/tst.fds.aarch64.r | 5 +++++ test/unittest/io/tst.fds.d | 1 + test/unittest/io/tst.fds.r | 5 +++++ test/unittest/io/tst.fds.sparc64.r | 5 +++++ test/unittest/io/tst.fds.x86_64.r | 5 +++++ 5 files changed, 21 insertions(+) diff --git a/test/unittest/io/tst.fds.aarch64.r b/test/unittest/io/tst.fds.aarch64.r index e1160e5df..762f122e2 100644 --- a/test/unittest/io/tst.fds.aarch64.r +++ b/test/unittest/io/tst.fds.aarch64.r @@ -1,29 +1,34 @@ fds[0] fi_dirname = . +fds[0] fi_fs = proc fds[0] fi_mount = fds[0] fi_name = mem fds[0] fi_offset = 0 fds[0] fi_oflags = 20000 fds[0] fi_pathname = fds[1] fi_dirname = . +fds[1] fi_fs = proc fds[1] fi_mount = fds[1] fi_name = mem fds[1] fi_offset = 0 fds[1] fi_oflags = 20001 fds[1] fi_pathname = fds[2] fi_dirname = . +fds[2] fi_fs = proc fds[2] fi_mount = fds[2] fi_name = mem fds[2] fi_offset = 0 fds[2] fi_oflags = 20002 fds[2] fi_pathname = fds[3] fi_dirname = . +fds[3] fi_fs = proc fds[3] fi_mount = fds[3] fi_name = mem fds[3] fi_offset = 0 fds[3] fi_oflags = 121c02 fds[3] fi_pathname = fds[4] fi_dirname = . +fds[4] fi_fs = proc fds[4] fi_mount = fds[4] fi_name = mem fds[4] fi_offset = 123 diff --git a/test/unittest/io/tst.fds.d b/test/unittest/io/tst.fds.d index 06caefe4d..2ae2a33b2 100644 --- a/test/unittest/io/tst.fds.d +++ b/test/unittest/io/tst.fds.d @@ -32,6 +32,7 @@ syscall::ioctl:entry printf("fds[%d] fi_name = %s\n", arg0, fds[arg0].fi_name); printf("fds[%d] fi_dirname = %s\n", arg0, fds[arg0].fi_dirname); printf("fds[%d] fi_pathname = %s\n", arg0, fds[arg0].fi_pathname); + printf("fds[%d] fi_fs = %s\n", arg0, fds[arg0].fi_fs); printf("fds[%d] fi_mount = %s\n", arg0, fds[arg0].fi_mount); printf("fds[%d] fi_offset = %d\n", arg0, fds[arg0].fi_offset); printf("fds[%d] fi_oflags = %x\n", arg0, fds[arg0].fi_oflags); diff --git a/test/unittest/io/tst.fds.r b/test/unittest/io/tst.fds.r index d7c12b86a..b5fa2df95 100644 --- a/test/unittest/io/tst.fds.r +++ b/test/unittest/io/tst.fds.r @@ -1,29 +1,34 @@ fds[0] fi_dirname = /proc/# +fds[0] fi_fs = proc fds[0] fi_mount = fds[0] fi_name = mem fds[0] fi_offset = 0 fds[0] fi_oflags = Please customize for arch fds[0] fi_pathname = /proc/#/mem fds[1] fi_dirname = /proc/# +fds[1] fi_fs = proc fds[1] fi_mount = fds[1] fi_name = mem fds[1] fi_offset = 0 fds[1] fi_oflags = Please customize for arch fds[1] fi_pathname = /proc/#/mem fds[2] fi_dirname = /proc/# +fds[2] fi_fs = proc fds[2] fi_mount = fds[2] fi_name = mem fds[2] fi_offset = 0 fds[2] fi_oflags = Please customize for arch fds[2] fi_pathname = /proc/#/mem fds[3] fi_dirname = /proc/# +fds[3] fi_fs = proc fds[3] fi_mount = fds[3] fi_name = mem fds[3] fi_offset = 0 fds[3] fi_oflags = Please customize for arch fds[3] fi_pathname = /proc/#/mem fds[4] fi_dirname = /proc/# +fds[4] fi_fs = proc fds[4] fi_mount = fds[4] fi_name = mem fds[4] fi_offset = 123 diff --git a/test/unittest/io/tst.fds.sparc64.r b/test/unittest/io/tst.fds.sparc64.r index 71cc83095..89579e176 100644 --- a/test/unittest/io/tst.fds.sparc64.r +++ b/test/unittest/io/tst.fds.sparc64.r @@ -1,29 +1,34 @@ fds[0] fi_dirname = . +fds[0] fi_fs = proc fds[0] fi_mount = fds[0] fi_name = mem fds[0] fi_offset = 0 fds[0] fi_oflags = 40000 fds[0] fi_pathname = fds[1] fi_dirname = . +fds[1] fi_fs = proc fds[1] fi_mount = fds[1] fi_name = mem fds[1] fi_offset = 0 fds[1] fi_oflags = 40001 fds[1] fi_pathname = fds[2] fi_dirname = . +fds[2] fi_fs = proc fds[2] fi_mount = fds[2] fi_name = mem fds[2] fi_offset = 0 fds[2] fi_oflags = 40002 fds[2] fi_pathname = fds[3] fi_dirname = . +fds[3] fi_fs = proc fds[3] fi_mount = fds[3] fi_name = mem fds[3] fi_offset = 0 fds[3] fi_oflags = 84600e fds[3] fi_pathname = fds[4] fi_dirname = . +fds[4] fi_fs = proc fds[4] fi_mount = fds[4] fi_name = mem fds[4] fi_offset = 123 diff --git a/test/unittest/io/tst.fds.x86_64.r b/test/unittest/io/tst.fds.x86_64.r index 799e85623..34172b2a2 100644 --- a/test/unittest/io/tst.fds.x86_64.r +++ b/test/unittest/io/tst.fds.x86_64.r @@ -1,29 +1,34 @@ fds[0] fi_dirname = . +fds[0] fi_fs = proc fds[0] fi_mount = fds[0] fi_name = mem fds[0] fi_offset = 0 fds[0] fi_oflags = 8000 fds[0] fi_pathname = fds[1] fi_dirname = . +fds[1] fi_fs = proc fds[1] fi_mount = fds[1] fi_name = mem fds[1] fi_offset = 0 fds[1] fi_oflags = 8001 fds[1] fi_pathname = fds[2] fi_dirname = . +fds[2] fi_fs = proc fds[2] fi_mount = fds[2] fi_name = mem fds[2] fi_offset = 0 fds[2] fi_oflags = 8002 fds[2] fi_pathname = fds[3] fi_dirname = . +fds[3] fi_fs = proc fds[3] fi_mount = fds[3] fi_name = mem fds[3] fi_offset = 0 fds[3] fi_oflags = 109c02 fds[3] fi_pathname = fds[4] fi_dirname = . +fds[4] fi_fs = proc fds[4] fi_mount = fds[4] fi_name = mem fds[4] fi_offset = 123 -- 2.43.5 From sam at gentoo.org Mon Apr 14 23:33:53 2025 From: sam at gentoo.org (Sam James) Date: Tue, 15 Apr 2025 00:33:53 +0100 Subject: [DTrace-devel] [PATCH] test: Account for pid:::entry ucaller being correct In-Reply-To: <20250319063230.28171-1-eugene.loh@oracle.com> References: <20250319063230.28171-1-eugene.loh@oracle.com> Message-ID: <87v7r6jp26.fsf@gentoo.org> "eugene.loh--- via DTrace-devel" writes: > From: Eugene Loh > > In commit f38bdf9ea ("test: Account for pid:::entry ustack() being correct") > we accounted for x86-specific heuristics introduced in Linux 6.11 that > dealt with pid:::entry uprobes firing so early in the function preamble > that the frame pointer is not yet set and the caller is not (yet) > correctly identified. > > Update a related test to account for the same effect with ucaller. > > Signed-off-by: Eugene Loh > --- > test/unittest/vars/tst.ucaller.r.p | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > create mode 100755 test/unittest/vars/tst.ucaller.r.p > > diff --git a/test/unittest/vars/tst.ucaller.r.p b/test/unittest/vars/tst.ucaller.r.p > new file mode 100755 > index 000000000..8e03f110d > --- /dev/null > +++ b/test/unittest/vars/tst.ucaller.r.p > @@ -0,0 +1,28 @@ > +#!/bin/sh > + > +# A pid entry probe places a uprobe on the first instruction of a function. > +# Unfortunately, this is so early in the function preamble that the function > +# frame pointer has not yet been established and the actual caller of the > +# traced function is missed. > +# > +# In Linux 6.11, x86-specific heuristics are introduced to fix this problem. > +# See commit cfa7f3d > +# ("perf,x86: avoid missing caller address in stack traces captured in uprobe") > +# for both a description of the problem and an explanation of the heuristics. > +# > +# Add post processing to these test results to allow for both cases: > +# caller frame is missing or not missing. > + > +if [ $(uname -m) == "x86_64" ]; then > + read MAJOR MINOR <<< `uname -r | grep -Eo '^[0-9]+\.[0-9]+' | tr '.' ' '` > + <<< is a bashism, but the shebang here is POSIX shell. Please just change it to bash IMO. > + if [ $MAJOR -ge 6 ]; then > + if [ $MAJOR -gt 6 -o $MINOR -ge 11 ]; then > + awk '{ sub("myfunc_w", "myfunc_v"); print; }' Please use gawk instead (see previous commits involving that). > + exit 0 > + fi > + fi > +fi > + > +# Otherwise, just pass the output through. > +cat From kris.van.hees at oracle.com Tue Apr 15 11:59:06 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 15 Apr 2025 07:59:06 -0400 Subject: [DTrace-devel] [PATCH v3 2/2] Clean up sched provider trampoline FIXMEs In-Reply-To: References: <20250403050252.15239-1-eugene.loh@oracle.com> Message-ID: On Fri, Apr 11, 2025 at 05:20:00PM -0400, Eugene Loh wrote: > On 4/11/25 16:48, Kris Van Hees wrote: > > > Partial comments below (still looking at the provider changes)... > > > > On Thu, Apr 03, 2025 at 01:02:52AM -0400, eugene.loh at oracle.com wrote: > > > From: Eugene Loh > > > > > > The sched provider trampoline for enqueue and dequeue probes had > > > pending FIXMEs for providing a cpuinfo_t* for the cpu associated > > > with the run queue. Implement the missing code. > > > > > > Since the cpu associated with the run queue might be different from > > > the cpu where we are running, it becomes necessary to access the > > > cpuinfo for some random cpu. With Linux 5.18, there is a BPF > > > helper function map_lookup_percpu_elem() that allows such lookups > > > on per-cpu arrays. To support older kernels, however, we change > > > the cpuinfo BPF map from per-cpu to global. Also, it is a hash > > > table rather than an array in case cpus are not numbered consecutively. > > I agree with all the above. Good solution. > > > > > Signed-off-by: Eugene Loh > > > --- > > > bpf/get_agg.c | 2 +- > > > bpf/get_bvar.c | 2 +- > > > libdtrace/dt_bpf.c | 34 ++++++-------- > > > libdtrace/dt_cg.c | 5 ++- > > > libdtrace/dt_prov_lockstat.c | 4 +- > > > libdtrace/dt_prov_sched.c | 74 +++++++++++++++++++++++++------ > > > libdtrace/dt_work.c | 20 +++------ > > > test/unittest/sched/tst.enqueue.d | 1 - > > > 8 files changed, 89 insertions(+), 53 deletions(-) > > > > > > diff --git a/bpf/get_agg.c b/bpf/get_agg.c > > > index c0eb825f0..e70caa6ef 100644 > > > --- a/bpf/get_agg.c > > > +++ b/bpf/get_agg.c > > > @@ -21,7 +21,7 @@ extern struct bpf_map_def cpuinfo; > > > */ > > > noinline uint64_t *dt_no_agg(void) > > > { > > > - uint32_t key = 0; > > > + uint32_t key = bpf_get_smp_processor_id(); > > > dt_bpf_cpuinfo_t *ci; > > > ci = bpf_map_lookup_elem(&cpuinfo, &key); > > > diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c > > > index d372b3445..d81c3605f 100644 > > > --- a/bpf/get_bvar.c > > > +++ b/bpf/get_bvar.c > > > @@ -67,7 +67,7 @@ noinline uint64_t dt_bvar_caller(const dt_dctx_t *dctx) > > > noinline uint64_t dt_bvar_curcpu(const dt_dctx_t *dctx) > > > { > > > - uint32_t key = 0; > > > + uint32_t key = bpf_get_smp_processor_id(); > > > void *val = bpf_map_lookup_elem(&cpuinfo, &key); > > > if (val == NULL) { > > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c > > > index 6d42a96c7..d6722cbd1 100644 > > > --- a/libdtrace/dt_bpf.c > > > +++ b/libdtrace/dt_bpf.c > > > @@ -761,37 +761,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp) > > > static int > > > gmap_create_cpuinfo(dtrace_hdl_t *dtp) > > > { > > > - int i, rc; > > > + int i; > > > uint32_t key = 0; > > > dtrace_conf_t *conf = &dtp->dt_conf; > > > size_t ncpus = conf->num_online_cpus; > > > - dt_bpf_cpuinfo_t *data; > > > + dt_bpf_cpuinfo_t data; > > Not sure about this, because (see below)... > > > > > cpuinfo_t *ci; > > > - /* > > > - * num_possible_cpus <= num_online_cpus: see dt_conf_init. > > > - */ > > > - data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > > > - sizeof(dt_bpf_cpuinfo_t)); > > > - if (data == NULL) > > > - return dt_set_errno(dtp, EDT_NOMEM); > > > - > > > - for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) > > > - memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t)); > > > - > > > dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo", > > > - BPF_MAP_TYPE_PERCPU_ARRAY, > > > + BPF_MAP_TYPE_HASH, > > > sizeof(uint32_t), > > > - sizeof(dt_bpf_cpuinfo_t), 1); > > > + sizeof(dt_bpf_cpuinfo_t), ncpus); > > > if (dtp->dt_cpumap_fd == -1) > > > return -1; > > > - rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); > > > - dt_free(dtp, data); > > > - if (rc == -1) > > > - return dt_bpf_error(dtp, > > > - "cannot update BPF map 'cpuinfo': %s\n", > > > - strerror(errno)); > > > + memset(&data, 0, sizeof(data)); > > Do we need this, because (see below).... > > > > > + for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) { > > > + memcpy(&data.ci, ci, sizeof(cpuinfo_t)); > > Do we need this, because (see below).... > > > > > + key = ci->cpu_id; > > > + if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, &data) == -1) > > Why can'you we simply do: > > > > if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, ci) == -1) > > I think the problem is that the BPF map has elements with size > sizeof(dt_bpf_cpuinfo_t).? Meanwhile, ci has size sizeof(cpuinfo_t), which > is smaller.? So if we do an update like that, the map will have stuff where > we want it to be initialized to 0. Yes, but I am 99% certain that BPF maps are allocated and initialized with zeros because doing otherwise would be a major security risk for the kernel. So you can count on that (should verify first to make certain but honestly it needs to be or else it could leak data which is a big no-no). > > > + return dt_bpf_error(dtp, > > > + "cannot update BPF map 'cpuinfo': %s\n", > > > + strerror(errno)); > > > + } > > > return 0; > > > } > > > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c > > > index 6dcf4cd3d..d83b1c2ce 100644 > > > --- a/libdtrace/dt_cg.c > > > +++ b/libdtrace/dt_cg.c > > > @@ -1243,9 +1243,12 @@ dt_cg_epilogue(dt_pcb_t *pcb) > > > } else { > > > idp = dt_dlib_get_map(dtp, "cpuinfo"); > > > assert(idp != NULL); > > > + > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > > > + > > > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > > > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP)); > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > > > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > > > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl)); > > > emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1)); > > > diff --git a/libdtrace/dt_prov_lockstat.c b/libdtrace/dt_prov_lockstat.c > > > index c73edf9be..8b2cf4da2 100644 > > > --- a/libdtrace/dt_prov_lockstat.c > > > +++ b/libdtrace/dt_prov_lockstat.c > > > @@ -121,11 +121,13 @@ static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > > > { > > > dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > > > + > > > assert(idp != NULL); > > > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > > > emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > > > emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_BASE)); > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > > > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > > > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > > > emit(dlp, BPF_MOV_REG(BPF_REG_6, BPF_REG_0)); > > > diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c > > > index 3a218f3cb..a548e679f 100644 > > > --- a/libdtrace/dt_prov_sched.c > > > +++ b/libdtrace/dt_prov_sched.c > > > @@ -84,6 +84,40 @@ static int populate(dtrace_hdl_t *dtp) > > > probe_args, probes); > > > } > > > +/* > > > + * Get a pointer to the cpuinfo_t structure for the CPU associated > > > + * with the runqueue that is in arg0. > > > + * > > > + * Clobbers %r1 through %r5 > > > + * Stores pointer to cpuinfo_t struct in %r0 > > > + */ > > > +static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > > > +{ > > > + dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > > > + > > > + assert(idp != NULL); > > > + > > > + /* Put the runqueue pointer from mst->arg0 into %r3. */ > > > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0))); > > > + > > > + /* Turn it into a pointer to its cpu member. */ > > > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, dt_cg_ctf_offsetof("struct rq", "cpu", NULL, 1))); > > > + > > > + /* Call bpf_probe_read_kernel(%fp + DT_TRAMP_SP_SLOT[0], sizeof(int), %r3) */ > > > + emit(dlp, BPF_MOV_IMM(BPF_REG_2, (int) sizeof(int))); > > > + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP)); > > > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0))); > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_kernel)); > > > + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, exitlbl)); > > > + > > > + /* Now look up the corresponding cpuinfo_t. */ > > > + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > > > + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > > > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0))); > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > > > + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > > > +} > > > + > > > /* > > > * Generate a BPF trampoline for a SDT probe. > > > * > > > @@ -98,18 +132,39 @@ static int populate(dtrace_hdl_t *dtp) > > > */ > > > static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > > > { > > > + dtrace_hdl_t *dtp = pcb->pcb_hdl; > > > dt_irlist_t *dlp = &pcb->pcb_ir; > > > dt_probe_t *prp = pcb->pcb_probe; > > > if (strcmp(prp->desc->prb, "dequeue") == 0) { > > > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > > > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > > > /* > > > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > > > - * associated with the runqueue. > > > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > > > + */ > > > + get_cpuinfo(dtp, dlp, exitlbl); > > > + > > > + /* > > > + * Copy arg1 into arg0. > > > */ > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > > > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > > > + > > > + /* Store the cpuinfo_t* in %r0 into arg1. */ > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > > > } else if (strcmp(prp->desc->prb, "enqueue") == 0) { > > > + /* > > > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > > > + */ > > > + get_cpuinfo(dtp, dlp, exitlbl); > > > + > > > + /* > > > + * Copy arg1 into arg0. > > > + */ > > > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > > > + > > > + /* Store the cpuinfo_t* in %r0 into arg1. */ > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > > > + > > > /* > > > * This is ugly but necessary... enqueue_task() takes a flags argument and the > > > * ENQUEUE_HEAD flag is used to indicate that the task is to be placed at the > > > @@ -120,15 +175,6 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > > > * outside the kernel source tree. > > > */ > > > #define ENQUEUE_HEAD 0x10 > > > - > > > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > > > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > > > - /* > > > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > > > - * associated with the runqueue. > > > - */ > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > > > - > > > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2))); > > > emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, ENQUEUE_HEAD)); > > > emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0)); > > > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c > > > index 498d5332a..2167ed299 100644 > > > --- a/libdtrace/dt_work.c > > > +++ b/libdtrace/dt_work.c > > > @@ -37,35 +37,29 @@ END_probe(void) > > > int > > > dt_check_cpudrops(dtrace_hdl_t *dtp, processorid_t cpu, dtrace_dropkind_t what) > > > { > > > - dt_bpf_cpuinfo_t *ci; > > > - uint32_t cikey = 0; > > > + dt_bpf_cpuinfo_t ci; > > > + uint32_t cikey = cpu; > > > uint64_t cnt; > > > int rval = 0; > > > assert(what == DTRACEDROP_PRINCIPAL || what == DTRACEDROP_AGGREGATION); > > > - ci = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > > > - sizeof(dt_bpf_cpuinfo_t)); > > > - if (ci == NULL) > > > - return dt_set_errno(dtp, EDT_NOMEM); > > > - > > > - if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, ci) == -1) { > > > + if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, &ci) == -1) { > > > rval = dt_set_errno(dtp, EDT_BPF); > > > goto fail; > > > } > > > if (what == DTRACEDROP_PRINCIPAL) { > > > - cnt = ci[cpu].buf_drops - dtp->dt_drops[cpu].buf; > > > - dtp->dt_drops[cpu].buf = ci[cpu].buf_drops; > > > + cnt = ci.buf_drops - dtp->dt_drops[cpu].buf; > > > + dtp->dt_drops[cpu].buf = ci.buf_drops; > > > } else { > > > - cnt = ci[cpu].agg_drops - dtp->dt_drops[cpu].agg; > > > - dtp->dt_drops[cpu].agg = ci[cpu].agg_drops; > > > + cnt = ci.agg_drops - dtp->dt_drops[cpu].agg; > > > + dtp->dt_drops[cpu].agg = ci.agg_drops; > > > } > > > rval = dt_handle_cpudrop(dtp, cpu, what, cnt); > > > fail: > > > - dt_free(dtp, ci); > > > return rval; > > > } > > > diff --git a/test/unittest/sched/tst.enqueue.d b/test/unittest/sched/tst.enqueue.d > > > index f445ac843..28dcace8c 100644 > > > --- a/test/unittest/sched/tst.enqueue.d > > > +++ b/test/unittest/sched/tst.enqueue.d > > > @@ -4,7 +4,6 @@ > > > * Licensed under the Universal Permissive License v 1.0 as shown at > > > * http://oss.oracle.com/licenses/upl. > > > */ > > > -/* @@xfail: dtv2 */ > > > #pragma D option switchrate=100hz > > > #pragma D option destructive > > > -- > > > 2.43.5 > > > From kris.van.hees at oracle.com Tue Apr 15 15:09:28 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 15 Apr 2025 11:09:28 -0400 Subject: [DTrace-devel] [PATCH] test: Test fds[] member fi_fs In-Reply-To: <20250414224057.3287-1-eugene.loh@oracle.com> References: <20250414224057.3287-1-eugene.loh@oracle.com> Message-ID: On Mon, Apr 14, 2025 at 06:40:57PM -0400, eugene.loh--- via DTrace-devel wrote: > From: Eugene Loh > > Signed-off-by: Eugene Loh Reviewed-by: Kris Van Hees > --- > test/unittest/io/tst.fds.aarch64.r | 5 +++++ > test/unittest/io/tst.fds.d | 1 + > test/unittest/io/tst.fds.r | 5 +++++ > test/unittest/io/tst.fds.sparc64.r | 5 +++++ > test/unittest/io/tst.fds.x86_64.r | 5 +++++ > 5 files changed, 21 insertions(+) > > diff --git a/test/unittest/io/tst.fds.aarch64.r b/test/unittest/io/tst.fds.aarch64.r > index e1160e5df..762f122e2 100644 > --- a/test/unittest/io/tst.fds.aarch64.r > +++ b/test/unittest/io/tst.fds.aarch64.r > @@ -1,29 +1,34 @@ > > fds[0] fi_dirname = . > +fds[0] fi_fs = proc > fds[0] fi_mount = > fds[0] fi_name = mem > fds[0] fi_offset = 0 > fds[0] fi_oflags = 20000 > fds[0] fi_pathname = > fds[1] fi_dirname = . > +fds[1] fi_fs = proc > fds[1] fi_mount = > fds[1] fi_name = mem > fds[1] fi_offset = 0 > fds[1] fi_oflags = 20001 > fds[1] fi_pathname = > fds[2] fi_dirname = . > +fds[2] fi_fs = proc > fds[2] fi_mount = > fds[2] fi_name = mem > fds[2] fi_offset = 0 > fds[2] fi_oflags = 20002 > fds[2] fi_pathname = > fds[3] fi_dirname = . > +fds[3] fi_fs = proc > fds[3] fi_mount = > fds[3] fi_name = mem > fds[3] fi_offset = 0 > fds[3] fi_oflags = 121c02 > fds[3] fi_pathname = > fds[4] fi_dirname = . > +fds[4] fi_fs = proc > fds[4] fi_mount = > fds[4] fi_name = mem > fds[4] fi_offset = 123 > diff --git a/test/unittest/io/tst.fds.d b/test/unittest/io/tst.fds.d > index 06caefe4d..2ae2a33b2 100644 > --- a/test/unittest/io/tst.fds.d > +++ b/test/unittest/io/tst.fds.d > @@ -32,6 +32,7 @@ syscall::ioctl:entry > printf("fds[%d] fi_name = %s\n", arg0, fds[arg0].fi_name); > printf("fds[%d] fi_dirname = %s\n", arg0, fds[arg0].fi_dirname); > printf("fds[%d] fi_pathname = %s\n", arg0, fds[arg0].fi_pathname); > + printf("fds[%d] fi_fs = %s\n", arg0, fds[arg0].fi_fs); > printf("fds[%d] fi_mount = %s\n", arg0, fds[arg0].fi_mount); > printf("fds[%d] fi_offset = %d\n", arg0, fds[arg0].fi_offset); > printf("fds[%d] fi_oflags = %x\n", arg0, fds[arg0].fi_oflags); > diff --git a/test/unittest/io/tst.fds.r b/test/unittest/io/tst.fds.r > index d7c12b86a..b5fa2df95 100644 > --- a/test/unittest/io/tst.fds.r > +++ b/test/unittest/io/tst.fds.r > @@ -1,29 +1,34 @@ > > fds[0] fi_dirname = /proc/# > +fds[0] fi_fs = proc > fds[0] fi_mount = > fds[0] fi_name = mem > fds[0] fi_offset = 0 > fds[0] fi_oflags = Please customize for arch > fds[0] fi_pathname = /proc/#/mem > fds[1] fi_dirname = /proc/# > +fds[1] fi_fs = proc > fds[1] fi_mount = > fds[1] fi_name = mem > fds[1] fi_offset = 0 > fds[1] fi_oflags = Please customize for arch > fds[1] fi_pathname = /proc/#/mem > fds[2] fi_dirname = /proc/# > +fds[2] fi_fs = proc > fds[2] fi_mount = > fds[2] fi_name = mem > fds[2] fi_offset = 0 > fds[2] fi_oflags = Please customize for arch > fds[2] fi_pathname = /proc/#/mem > fds[3] fi_dirname = /proc/# > +fds[3] fi_fs = proc > fds[3] fi_mount = > fds[3] fi_name = mem > fds[3] fi_offset = 0 > fds[3] fi_oflags = Please customize for arch > fds[3] fi_pathname = /proc/#/mem > fds[4] fi_dirname = /proc/# > +fds[4] fi_fs = proc > fds[4] fi_mount = > fds[4] fi_name = mem > fds[4] fi_offset = 123 > diff --git a/test/unittest/io/tst.fds.sparc64.r b/test/unittest/io/tst.fds.sparc64.r > index 71cc83095..89579e176 100644 > --- a/test/unittest/io/tst.fds.sparc64.r > +++ b/test/unittest/io/tst.fds.sparc64.r > @@ -1,29 +1,34 @@ > > fds[0] fi_dirname = . > +fds[0] fi_fs = proc > fds[0] fi_mount = > fds[0] fi_name = mem > fds[0] fi_offset = 0 > fds[0] fi_oflags = 40000 > fds[0] fi_pathname = > fds[1] fi_dirname = . > +fds[1] fi_fs = proc > fds[1] fi_mount = > fds[1] fi_name = mem > fds[1] fi_offset = 0 > fds[1] fi_oflags = 40001 > fds[1] fi_pathname = > fds[2] fi_dirname = . > +fds[2] fi_fs = proc > fds[2] fi_mount = > fds[2] fi_name = mem > fds[2] fi_offset = 0 > fds[2] fi_oflags = 40002 > fds[2] fi_pathname = > fds[3] fi_dirname = . > +fds[3] fi_fs = proc > fds[3] fi_mount = > fds[3] fi_name = mem > fds[3] fi_offset = 0 > fds[3] fi_oflags = 84600e > fds[3] fi_pathname = > fds[4] fi_dirname = . > +fds[4] fi_fs = proc > fds[4] fi_mount = > fds[4] fi_name = mem > fds[4] fi_offset = 123 > diff --git a/test/unittest/io/tst.fds.x86_64.r b/test/unittest/io/tst.fds.x86_64.r > index 799e85623..34172b2a2 100644 > --- a/test/unittest/io/tst.fds.x86_64.r > +++ b/test/unittest/io/tst.fds.x86_64.r > @@ -1,29 +1,34 @@ > > fds[0] fi_dirname = . > +fds[0] fi_fs = proc > fds[0] fi_mount = > fds[0] fi_name = mem > fds[0] fi_offset = 0 > fds[0] fi_oflags = 8000 > fds[0] fi_pathname = > fds[1] fi_dirname = . > +fds[1] fi_fs = proc > fds[1] fi_mount = > fds[1] fi_name = mem > fds[1] fi_offset = 0 > fds[1] fi_oflags = 8001 > fds[1] fi_pathname = > fds[2] fi_dirname = . > +fds[2] fi_fs = proc > fds[2] fi_mount = > fds[2] fi_name = mem > fds[2] fi_offset = 0 > fds[2] fi_oflags = 8002 > fds[2] fi_pathname = > fds[3] fi_dirname = . > +fds[3] fi_fs = proc > fds[3] fi_mount = > fds[3] fi_name = mem > fds[3] fi_offset = 0 > fds[3] fi_oflags = 109c02 > fds[3] fi_pathname = > fds[4] fi_dirname = . > +fds[4] fi_fs = proc > fds[4] fi_mount = > fds[4] fi_name = mem > fds[4] fi_offset = 123 > -- > 2.43.5 > > > _______________________________________________ > DTrace-devel mailing list > DTrace-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/dtrace-devel From kris.van.hees at oracle.com Tue Apr 15 15:10:15 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 15 Apr 2025 11:10:15 -0400 Subject: [DTrace-devel] [PATCH 1/2] test: Remove orphaned tst.lockstat.r In-Reply-To: <20250414201959.31327-1-eugene.loh@oracle.com> References: <20250414201959.31327-1-eugene.loh@oracle.com> Message-ID: On Mon, Apr 14, 2025 at 04:19:58PM -0400, eugene.loh--- via DTrace-devel wrote: > From: Eugene Loh > > In commit ded09d05a ("test: rework main lockstat test"), a lockstat > test was renamed. Its .r results file was thereby orphaned. Remove > the orphaned copy. > > Signed-off-by: Eugene Loh Reviewed-by: Kris Van Hees > --- > test/unittest/lockstat/tst.lockstat.r | 13 ------------- > 1 file changed, 13 deletions(-) > delete mode 100644 test/unittest/lockstat/tst.lockstat.r > > diff --git a/test/unittest/lockstat/tst.lockstat.r b/test/unittest/lockstat/tst.lockstat.r > deleted file mode 100644 > index 5bdc40c67..000000000 > --- a/test/unittest/lockstat/tst.lockstat.r > +++ /dev/null > @@ -1,13 +0,0 @@ > -Minimum lockstat events seen > - > -lockstat:::adaptive-spin - yes > -lockstat:::adaptive-block - yes > -lockstat:::adaptive-acquire - yes > -lockstat:::adaptive-release - yes > -lockstat:::rw-spin - yes > -lockstat:::rw-acquire - yes > -lockstat:::rw-release - yes > -lockstat:::spin-spin - yes > -lockstat:::spin-acquire - yes > -lockstat:::spin-release - yes > - > -- > 2.43.5 > > > _______________________________________________ > DTrace-devel mailing list > DTrace-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/dtrace-devel From kris.van.hees at oracle.com Tue Apr 15 15:32:38 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 15 Apr 2025 11:32:38 -0400 Subject: [DTrace-devel] [PATCH 2/2] test: Skip pid-0 tests on oversubscribed systems In-Reply-To: <20250414201959.31327-2-eugene.loh@oracle.com> References: <20250414201959.31327-1-eugene.loh@oracle.com> <20250414201959.31327-2-eugene.loh@oracle.com> Message-ID: On Mon, Apr 14, 2025 at 04:19:59PM -0400, eugene.loh--- via DTrace-devel wrote: > From: Eugene Loh > > A number of tests check "tick-n /pid==0/" probes. The problem > with this is that a tick-n probe runs on a specific CPU. If that > CPU is fully subscribed, then pid 0 (swapper) will not run. Thus, > the test will take a long time, only to time out. > > Change these tests to use profile-n instead of tick-n probes, > improving chances that the test probe will fire on a less subscribed > CPU. > > Therefore, also change the .r.p post-processing file so that it uses > only one output line (in case two CPUs manage to write output). > > Finally, add skip files in case pid 0 does not fire on any CPU. > > Signed-off-by: Eugene Loh Reviewed-by: Kris Van Hees ... although I think it would be best to have a single .x file and then create symbolic links for its copies under different names. > --- > test/unittest/ustack/tst.kthread.d | 4 ++-- > test/unittest/ustack/tst.kthread.x | 5 +++++ > test/unittest/ustack/tst.uaddr-pid0.d | 4 ++-- > test/unittest/ustack/tst.uaddr-pid0.r.p | 4 ++-- > test/unittest/ustack/tst.uaddr-pid0.x | 5 +++++ > test/unittest/ustack/tst.ufunc-pid0.d | 4 ++-- > test/unittest/ustack/tst.ufunc-pid0.r.p | 4 ++-- > test/unittest/ustack/tst.ufunc-pid0.x | 5 +++++ > test/unittest/ustack/tst.usym-pid0.d | 4 ++-- > test/unittest/ustack/tst.usym-pid0.r.p | 4 ++-- > test/unittest/ustack/tst.usym-pid0.x | 5 +++++ > 11 files changed, 34 insertions(+), 14 deletions(-) > create mode 100755 test/unittest/ustack/tst.kthread.x > create mode 100755 test/unittest/ustack/tst.uaddr-pid0.x > create mode 100755 test/unittest/ustack/tst.ufunc-pid0.x > create mode 100755 test/unittest/ustack/tst.usym-pid0.x > > diff --git a/test/unittest/ustack/tst.kthread.d b/test/unittest/ustack/tst.kthread.d > index c6252b742..83ae6f7c6 100644 > --- a/test/unittest/ustack/tst.kthread.d > +++ b/test/unittest/ustack/tst.kthread.d > @@ -1,6 +1,6 @@ > /* > * Oracle Linux DTrace. > - * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2013, 2025, Oracle and/or its affiliates. All rights reserved. > * Licensed under the Universal Permissive License v 1.0 as shown at > * http://oss.oracle.com/licenses/upl. > */ > @@ -16,4 +16,4 @@ > > #pragma D option quiet > > -tick-100msec / pid == 0 / { ustack(); exit(0); } > +profile-100msec / pid == 0 / { ustack(); exit(0); } > diff --git a/test/unittest/ustack/tst.kthread.x b/test/unittest/ustack/tst.kthread.x > new file mode 100755 > index 000000000..b5fe7177a > --- /dev/null > +++ b/test/unittest/ustack/tst.kthread.x > @@ -0,0 +1,5 @@ > +#!/bin/sh > + > +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } > + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' > +exit $? > diff --git a/test/unittest/ustack/tst.uaddr-pid0.d b/test/unittest/ustack/tst.uaddr-pid0.d > index 263a7ca94..ab54eea40 100644 > --- a/test/unittest/ustack/tst.uaddr-pid0.d > +++ b/test/unittest/ustack/tst.uaddr-pid0.d > @@ -1,6 +1,6 @@ > /* > * Oracle Linux DTrace. > - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. > * Licensed under the Universal Permissive License v 1.0 as shown at > * http://oss.oracle.com/licenses/upl. > */ > @@ -9,7 +9,7 @@ > > #pragma D option quiet > > -tick-1 > +profile-1 > /pid == $target/ > { > uaddr(ucaller); > diff --git a/test/unittest/ustack/tst.uaddr-pid0.r.p b/test/unittest/ustack/tst.uaddr-pid0.r.p > index 9203dc824..78ab8e59d 100755 > --- a/test/unittest/ustack/tst.uaddr-pid0.r.p > +++ b/test/unittest/ustack/tst.uaddr-pid0.r.p > @@ -1,4 +1,4 @@ > #!/usr/bin/gawk -f > > -# remove trailing blanks > -{ sub(" *$", ""); print } > +# remove trailing blanks, use only one line > +{ sub(" *$", ""); print; exit } > diff --git a/test/unittest/ustack/tst.uaddr-pid0.x b/test/unittest/ustack/tst.uaddr-pid0.x > new file mode 100755 > index 000000000..b5fe7177a > --- /dev/null > +++ b/test/unittest/ustack/tst.uaddr-pid0.x > @@ -0,0 +1,5 @@ > +#!/bin/sh > + > +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } > + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' > +exit $? > diff --git a/test/unittest/ustack/tst.ufunc-pid0.d b/test/unittest/ustack/tst.ufunc-pid0.d > index f076782aa..cd34275f1 100644 > --- a/test/unittest/ustack/tst.ufunc-pid0.d > +++ b/test/unittest/ustack/tst.ufunc-pid0.d > @@ -1,6 +1,6 @@ > /* > * Oracle Linux DTrace. > - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. > * Licensed under the Universal Permissive License v 1.0 as shown at > * http://oss.oracle.com/licenses/upl. > */ > @@ -9,7 +9,7 @@ > > #pragma D option quiet > > -tick-1 > +profile-1 > /pid == $target/ > { > ufunc(ucaller); > diff --git a/test/unittest/ustack/tst.ufunc-pid0.r.p b/test/unittest/ustack/tst.ufunc-pid0.r.p > index 9203dc824..78ab8e59d 100755 > --- a/test/unittest/ustack/tst.ufunc-pid0.r.p > +++ b/test/unittest/ustack/tst.ufunc-pid0.r.p > @@ -1,4 +1,4 @@ > #!/usr/bin/gawk -f > > -# remove trailing blanks > -{ sub(" *$", ""); print } > +# remove trailing blanks, use only one line > +{ sub(" *$", ""); print; exit } > diff --git a/test/unittest/ustack/tst.ufunc-pid0.x b/test/unittest/ustack/tst.ufunc-pid0.x > new file mode 100755 > index 000000000..b5fe7177a > --- /dev/null > +++ b/test/unittest/ustack/tst.ufunc-pid0.x > @@ -0,0 +1,5 @@ > +#!/bin/sh > + > +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } > + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' > +exit $? > diff --git a/test/unittest/ustack/tst.usym-pid0.d b/test/unittest/ustack/tst.usym-pid0.d > index d2f5ec5de..9aceab355 100644 > --- a/test/unittest/ustack/tst.usym-pid0.d > +++ b/test/unittest/ustack/tst.usym-pid0.d > @@ -1,6 +1,6 @@ > /* > * Oracle Linux DTrace. > - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. > * Licensed under the Universal Permissive License v 1.0 as shown at > * http://oss.oracle.com/licenses/upl. > */ > @@ -9,7 +9,7 @@ > > #pragma D option quiet > > -tick-1 > +profile-1 > /pid == $target/ > { > usym(ucaller); > diff --git a/test/unittest/ustack/tst.usym-pid0.r.p b/test/unittest/ustack/tst.usym-pid0.r.p > index 9203dc824..78ab8e59d 100755 > --- a/test/unittest/ustack/tst.usym-pid0.r.p > +++ b/test/unittest/ustack/tst.usym-pid0.r.p > @@ -1,4 +1,4 @@ > #!/usr/bin/gawk -f > > -# remove trailing blanks > -{ sub(" *$", ""); print } > +# remove trailing blanks, use only one line > +{ sub(" *$", ""); print; exit } > diff --git a/test/unittest/ustack/tst.usym-pid0.x b/test/unittest/ustack/tst.usym-pid0.x > new file mode 100755 > index 000000000..b5fe7177a > --- /dev/null > +++ b/test/unittest/ustack/tst.usym-pid0.x > @@ -0,0 +1,5 @@ > +#!/bin/sh > + > +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } > + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' > +exit $? > -- > 2.43.5 > > > _______________________________________________ > DTrace-devel mailing list > DTrace-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/dtrace-devel From eugene.loh at oracle.com Tue Apr 15 17:11:57 2025 From: eugene.loh at oracle.com (Eugene Loh) Date: Tue, 15 Apr 2025 13:11:57 -0400 Subject: [DTrace-devel] [PATCH v3 2/2] Clean up sched provider trampoline FIXMEs In-Reply-To: References: <20250403050252.15239-1-eugene.loh@oracle.com> Message-ID: On 4/15/25 07:59, Kris Van Hees wrote: > On Fri, Apr 11, 2025 at 05:20:00PM -0400, Eugene Loh wrote: >> On 4/11/25 16:48, Kris Van Hees wrote: >> >>> Partial comments below (still looking at the provider changes)... >>> >>> On Thu, Apr 03, 2025 at 01:02:52AM -0400, eugene.loh at oracle.com wrote: >>>> From: Eugene Loh >>>> >>>> The sched provider trampoline for enqueue and dequeue probes had >>>> pending FIXMEs for providing a cpuinfo_t* for the cpu associated >>>> with the run queue. Implement the missing code. >>>> >>>> Since the cpu associated with the run queue might be different from >>>> the cpu where we are running, it becomes necessary to access the >>>> cpuinfo for some random cpu. With Linux 5.18, there is a BPF >>>> helper function map_lookup_percpu_elem() that allows such lookups >>>> on per-cpu arrays. To support older kernels, however, we change >>>> the cpuinfo BPF map from per-cpu to global. Also, it is a hash >>>> table rather than an array in case cpus are not numbered consecutively. >>> I agree with all the above. Good solution. >>> >>>> Signed-off-by: Eugene Loh >>>> --- >>>> bpf/get_agg.c | 2 +- >>>> bpf/get_bvar.c | 2 +- >>>> libdtrace/dt_bpf.c | 34 ++++++-------- >>>> libdtrace/dt_cg.c | 5 ++- >>>> libdtrace/dt_prov_lockstat.c | 4 +- >>>> libdtrace/dt_prov_sched.c | 74 +++++++++++++++++++++++++------ >>>> libdtrace/dt_work.c | 20 +++------ >>>> test/unittest/sched/tst.enqueue.d | 1 - >>>> 8 files changed, 89 insertions(+), 53 deletions(-) >>>> >>>> diff --git a/bpf/get_agg.c b/bpf/get_agg.c >>>> index c0eb825f0..e70caa6ef 100644 >>>> --- a/bpf/get_agg.c >>>> +++ b/bpf/get_agg.c >>>> @@ -21,7 +21,7 @@ extern struct bpf_map_def cpuinfo; >>>> */ >>>> noinline uint64_t *dt_no_agg(void) >>>> { >>>> - uint32_t key = 0; >>>> + uint32_t key = bpf_get_smp_processor_id(); >>>> dt_bpf_cpuinfo_t *ci; >>>> ci = bpf_map_lookup_elem(&cpuinfo, &key); >>>> diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c >>>> index d372b3445..d81c3605f 100644 >>>> --- a/bpf/get_bvar.c >>>> +++ b/bpf/get_bvar.c >>>> @@ -67,7 +67,7 @@ noinline uint64_t dt_bvar_caller(const dt_dctx_t *dctx) >>>> noinline uint64_t dt_bvar_curcpu(const dt_dctx_t *dctx) >>>> { >>>> - uint32_t key = 0; >>>> + uint32_t key = bpf_get_smp_processor_id(); >>>> void *val = bpf_map_lookup_elem(&cpuinfo, &key); >>>> if (val == NULL) { >>>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c >>>> index 6d42a96c7..d6722cbd1 100644 >>>> --- a/libdtrace/dt_bpf.c >>>> +++ b/libdtrace/dt_bpf.c >>>> @@ -761,37 +761,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp) >>>> static int >>>> gmap_create_cpuinfo(dtrace_hdl_t *dtp) >>>> { >>>> - int i, rc; >>>> + int i; >>>> uint32_t key = 0; >>>> dtrace_conf_t *conf = &dtp->dt_conf; >>>> size_t ncpus = conf->num_online_cpus; >>>> - dt_bpf_cpuinfo_t *data; >>>> + dt_bpf_cpuinfo_t data; >>> Not sure about this, because (see below)... >>> >>>> cpuinfo_t *ci; >>>> - /* >>>> - * num_possible_cpus <= num_online_cpus: see dt_conf_init. >>>> - */ >>>> - data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, >>>> - sizeof(dt_bpf_cpuinfo_t)); >>>> - if (data == NULL) >>>> - return dt_set_errno(dtp, EDT_NOMEM); >>>> - >>>> - for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) >>>> - memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t)); >>>> - >>>> dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo", >>>> - BPF_MAP_TYPE_PERCPU_ARRAY, >>>> + BPF_MAP_TYPE_HASH, >>>> sizeof(uint32_t), >>>> - sizeof(dt_bpf_cpuinfo_t), 1); >>>> + sizeof(dt_bpf_cpuinfo_t), ncpus); >>>> if (dtp->dt_cpumap_fd == -1) >>>> return -1; >>>> - rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); >>>> - dt_free(dtp, data); >>>> - if (rc == -1) >>>> - return dt_bpf_error(dtp, >>>> - "cannot update BPF map 'cpuinfo': %s\n", >>>> - strerror(errno)); >>>> + memset(&data, 0, sizeof(data)); >>> Do we need this, because (see below).... >>> >>>> + for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) { >>>> + memcpy(&data.ci, ci, sizeof(cpuinfo_t)); >>> Do we need this, because (see below).... >>> >>>> + key = ci->cpu_id; >>>> + if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, &data) == -1) >>> Why can'you we simply do: >>> >>> if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, ci) == -1) >> I think the problem is that the BPF map has elements with size >> sizeof(dt_bpf_cpuinfo_t).? Meanwhile, ci has size sizeof(cpuinfo_t), which >> is smaller.? So if we do an update like that, the map will have stuff where >> we want it to be initialized to 0. > Yes, but I am 99% certain that BPF maps are allocated and initialized with > zeros because doing otherwise would be a major security risk for the kernel. > So you can count on that (should verify first to make certain but honestly > it needs to be or else it could leak data which is a big no-no). I don't think that helps.? We do not feed map_update() a size.? We cannot say, "Fill in only the first few bytes of the element."? We just point to where the new value is and the BPF function copies as much data in as it needs for the map element.? Unless we prepare a zero-padded copy of the data, map_update() will overreach and copy in values that should be zero but are actually data corresponding to other CPUs. >>>> + return dt_bpf_error(dtp, >>>> + "cannot update BPF map 'cpuinfo': %s\n", >>>> + strerror(errno)); >>>> + } >>>> return 0; >>>> } >>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c >>>> index 6dcf4cd3d..d83b1c2ce 100644 >>>> --- a/libdtrace/dt_cg.c >>>> +++ b/libdtrace/dt_cg.c >>>> @@ -1243,9 +1243,12 @@ dt_cg_epilogue(dt_pcb_t *pcb) >>>> } else { >>>> idp = dt_dlib_get_map(dtp, "cpuinfo"); >>>> assert(idp != NULL); >>>> + >>>> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); >>>> + >>>> dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); >>>> emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP)); >>>> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); >>>> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); >>>> emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); >>>> emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl)); >>>> emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1)); >>>> diff --git a/libdtrace/dt_prov_lockstat.c b/libdtrace/dt_prov_lockstat.c >>>> index c73edf9be..8b2cf4da2 100644 >>>> --- a/libdtrace/dt_prov_lockstat.c >>>> +++ b/libdtrace/dt_prov_lockstat.c >>>> @@ -121,11 +121,13 @@ static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) >>>> { >>>> dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); >>>> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); >>>> + >>>> assert(idp != NULL); >>>> dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); >>>> emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); >>>> emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_BASE)); >>>> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); >>>> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); >>>> emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); >>>> emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); >>>> emit(dlp, BPF_MOV_REG(BPF_REG_6, BPF_REG_0)); >>>> diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c >>>> index 3a218f3cb..a548e679f 100644 >>>> --- a/libdtrace/dt_prov_sched.c >>>> +++ b/libdtrace/dt_prov_sched.c >>>> @@ -84,6 +84,40 @@ static int populate(dtrace_hdl_t *dtp) >>>> probe_args, probes); >>>> } >>>> +/* >>>> + * Get a pointer to the cpuinfo_t structure for the CPU associated >>>> + * with the runqueue that is in arg0. >>>> + * >>>> + * Clobbers %r1 through %r5 >>>> + * Stores pointer to cpuinfo_t struct in %r0 >>>> + */ >>>> +static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) >>>> +{ >>>> + dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); >>>> + >>>> + assert(idp != NULL); >>>> + >>>> + /* Put the runqueue pointer from mst->arg0 into %r3. */ >>>> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0))); >>>> + >>>> + /* Turn it into a pointer to its cpu member. */ >>>> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, dt_cg_ctf_offsetof("struct rq", "cpu", NULL, 1))); >>>> + >>>> + /* Call bpf_probe_read_kernel(%fp + DT_TRAMP_SP_SLOT[0], sizeof(int), %r3) */ >>>> + emit(dlp, BPF_MOV_IMM(BPF_REG_2, (int) sizeof(int))); >>>> + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP)); >>>> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0))); >>>> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_kernel)); >>>> + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, exitlbl)); >>>> + >>>> + /* Now look up the corresponding cpuinfo_t. */ >>>> + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); >>>> + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); >>>> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0))); >>>> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); >>>> + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); >>>> +} >>>> + >>>> /* >>>> * Generate a BPF trampoline for a SDT probe. >>>> * >>>> @@ -98,18 +132,39 @@ static int populate(dtrace_hdl_t *dtp) >>>> */ >>>> static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) >>>> { >>>> + dtrace_hdl_t *dtp = pcb->pcb_hdl; >>>> dt_irlist_t *dlp = &pcb->pcb_ir; >>>> dt_probe_t *prp = pcb->pcb_probe; >>>> if (strcmp(prp->desc->prb, "dequeue") == 0) { >>>> - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); >>>> - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); >>>> /* >>>> - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU >>>> - * associated with the runqueue. >>>> + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. >>>> + */ >>>> + get_cpuinfo(dtp, dlp, exitlbl); >>>> + >>>> + /* >>>> + * Copy arg1 into arg0. >>>> */ >>>> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); >>>> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); >>>> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); >>>> + >>>> + /* Store the cpuinfo_t* in %r0 into arg1. */ >>>> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); >>>> } else if (strcmp(prp->desc->prb, "enqueue") == 0) { >>>> + /* >>>> + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. >>>> + */ >>>> + get_cpuinfo(dtp, dlp, exitlbl); >>>> + >>>> + /* >>>> + * Copy arg1 into arg0. >>>> + */ >>>> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); >>>> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); >>>> + >>>> + /* Store the cpuinfo_t* in %r0 into arg1. */ >>>> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); >>>> + >>>> /* >>>> * This is ugly but necessary... enqueue_task() takes a flags argument and the >>>> * ENQUEUE_HEAD flag is used to indicate that the task is to be placed at the >>>> @@ -120,15 +175,6 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) >>>> * outside the kernel source tree. >>>> */ >>>> #define ENQUEUE_HEAD 0x10 >>>> - >>>> - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); >>>> - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); >>>> - /* >>>> - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU >>>> - * associated with the runqueue. >>>> - */ >>>> - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); >>>> - >>>> emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2))); >>>> emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, ENQUEUE_HEAD)); >>>> emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0)); >>>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c >>>> index 498d5332a..2167ed299 100644 >>>> --- a/libdtrace/dt_work.c >>>> +++ b/libdtrace/dt_work.c >>>> @@ -37,35 +37,29 @@ END_probe(void) >>>> int >>>> dt_check_cpudrops(dtrace_hdl_t *dtp, processorid_t cpu, dtrace_dropkind_t what) >>>> { >>>> - dt_bpf_cpuinfo_t *ci; >>>> - uint32_t cikey = 0; >>>> + dt_bpf_cpuinfo_t ci; >>>> + uint32_t cikey = cpu; >>>> uint64_t cnt; >>>> int rval = 0; >>>> assert(what == DTRACEDROP_PRINCIPAL || what == DTRACEDROP_AGGREGATION); >>>> - ci = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, >>>> - sizeof(dt_bpf_cpuinfo_t)); >>>> - if (ci == NULL) >>>> - return dt_set_errno(dtp, EDT_NOMEM); >>>> - >>>> - if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, ci) == -1) { >>>> + if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, &ci) == -1) { >>>> rval = dt_set_errno(dtp, EDT_BPF); >>>> goto fail; >>>> } >>>> if (what == DTRACEDROP_PRINCIPAL) { >>>> - cnt = ci[cpu].buf_drops - dtp->dt_drops[cpu].buf; >>>> - dtp->dt_drops[cpu].buf = ci[cpu].buf_drops; >>>> + cnt = ci.buf_drops - dtp->dt_drops[cpu].buf; >>>> + dtp->dt_drops[cpu].buf = ci.buf_drops; >>>> } else { >>>> - cnt = ci[cpu].agg_drops - dtp->dt_drops[cpu].agg; >>>> - dtp->dt_drops[cpu].agg = ci[cpu].agg_drops; >>>> + cnt = ci.agg_drops - dtp->dt_drops[cpu].agg; >>>> + dtp->dt_drops[cpu].agg = ci.agg_drops; >>>> } >>>> rval = dt_handle_cpudrop(dtp, cpu, what, cnt); >>>> fail: >>>> - dt_free(dtp, ci); >>>> return rval; >>>> } >>>> diff --git a/test/unittest/sched/tst.enqueue.d b/test/unittest/sched/tst.enqueue.d >>>> index f445ac843..28dcace8c 100644 >>>> --- a/test/unittest/sched/tst.enqueue.d >>>> +++ b/test/unittest/sched/tst.enqueue.d >>>> @@ -4,7 +4,6 @@ >>>> * Licensed under the Universal Permissive License v 1.0 as shown at >>>> * http://oss.oracle.com/licenses/upl. >>>> */ >>>> -/* @@xfail: dtv2 */ >>>> #pragma D option switchrate=100hz >>>> #pragma D option destructive >>>> -- >>>> 2.43.5 >>>> From eugene.loh at oracle.com Tue Apr 15 20:19:54 2025 From: eugene.loh at oracle.com (eugene.loh at oracle.com) Date: Tue, 15 Apr 2025 16:19:54 -0400 Subject: [DTrace-devel] [PATCH v2 2/2] test: Skip pid-0 tests on oversubscribed systems Message-ID: <20250415201954.5564-1-eugene.loh@oracle.com> From: Eugene Loh A number of tests check "tick-n /pid==0/" probes. The problem with this is that a tick-n probe runs on a specific CPU. If that CPU is fully subscribed, then pid 0 (swapper) will not run. Thus, the test will take a long time, only to time out. Change these tests to use profile-n instead of tick-n probes, improving chances that the test probe will fire on a less subscribed CPU. Therefore, also change the .r.p post-processing file so that it uses only one output line (in case two CPUs manage to write output). Finally, add skip files in case pid 0 does not fire on any CPU. Signed-off-by: Eugene Loh Reviewed-by: Kris Van Hees --- test/unittest/ustack/skip_pid0_if_oversubscribed.x | 5 +++++ test/unittest/ustack/tst.kthread.d | 9 +++++++-- test/unittest/ustack/tst.kthread.x | 1 + test/unittest/ustack/tst.uaddr-pid0.d | 5 +++-- test/unittest/ustack/tst.uaddr-pid0.r.p | 4 ++-- test/unittest/ustack/tst.uaddr-pid0.x | 1 + test/unittest/ustack/tst.ufunc-pid0.d | 5 +++-- test/unittest/ustack/tst.ufunc-pid0.r.p | 4 ++-- test/unittest/ustack/tst.ufunc-pid0.x | 1 + test/unittest/ustack/tst.usym-pid0.d | 5 +++-- test/unittest/ustack/tst.usym-pid0.r.p | 4 ++-- test/unittest/ustack/tst.usym-pid0.x | 1 + 12 files changed, 31 insertions(+), 14 deletions(-) create mode 100755 test/unittest/ustack/skip_pid0_if_oversubscribed.x create mode 120000 test/unittest/ustack/tst.kthread.x create mode 120000 test/unittest/ustack/tst.uaddr-pid0.x create mode 120000 test/unittest/ustack/tst.ufunc-pid0.x create mode 120000 test/unittest/ustack/tst.usym-pid0.x diff --git a/test/unittest/ustack/skip_pid0_if_oversubscribed.x b/test/unittest/ustack/skip_pid0_if_oversubscribed.x new file mode 100755 index 000000000..b5fe7177a --- /dev/null +++ b/test/unittest/ustack/skip_pid0_if_oversubscribed.x @@ -0,0 +1,5 @@ +#!/bin/sh + +$dtrace -qn 'profile-100ms /pid == 0/ { exit(0) } + tick-1s { trace("cannot profile pid 0; oversubscribed system?"); exit(2) }' +exit $? diff --git a/test/unittest/ustack/tst.kthread.d b/test/unittest/ustack/tst.kthread.d index c6252b742..0fe5279f3 100644 --- a/test/unittest/ustack/tst.kthread.d +++ b/test/unittest/ustack/tst.kthread.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2013, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -16,4 +16,9 @@ #pragma D option quiet -tick-100msec / pid == 0 / { ustack(); exit(0); } +profile-100ms +/ pid == 0 / +{ + ustack(); + exit(0); +} diff --git a/test/unittest/ustack/tst.kthread.x b/test/unittest/ustack/tst.kthread.x new file mode 120000 index 000000000..1df0f1ccc --- /dev/null +++ b/test/unittest/ustack/tst.kthread.x @@ -0,0 +1 @@ +skip_pid0_if_oversubscribed.x \ No newline at end of file diff --git a/test/unittest/ustack/tst.uaddr-pid0.d b/test/unittest/ustack/tst.uaddr-pid0.d index 263a7ca94..3580f0cde 100644 --- a/test/unittest/ustack/tst.uaddr-pid0.d +++ b/test/unittest/ustack/tst.uaddr-pid0.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -9,9 +9,10 @@ #pragma D option quiet -tick-1 +profile-100ms /pid == $target/ { uaddr(ucaller); + printf("\n"); exit(0); } diff --git a/test/unittest/ustack/tst.uaddr-pid0.r.p b/test/unittest/ustack/tst.uaddr-pid0.r.p index 9203dc824..78ab8e59d 100755 --- a/test/unittest/ustack/tst.uaddr-pid0.r.p +++ b/test/unittest/ustack/tst.uaddr-pid0.r.p @@ -1,4 +1,4 @@ #!/usr/bin/gawk -f -# remove trailing blanks -{ sub(" *$", ""); print } +# remove trailing blanks, use only one line +{ sub(" *$", ""); print; exit } diff --git a/test/unittest/ustack/tst.uaddr-pid0.x b/test/unittest/ustack/tst.uaddr-pid0.x new file mode 120000 index 000000000..1df0f1ccc --- /dev/null +++ b/test/unittest/ustack/tst.uaddr-pid0.x @@ -0,0 +1 @@ +skip_pid0_if_oversubscribed.x \ No newline at end of file diff --git a/test/unittest/ustack/tst.ufunc-pid0.d b/test/unittest/ustack/tst.ufunc-pid0.d index f076782aa..0778c33c1 100644 --- a/test/unittest/ustack/tst.ufunc-pid0.d +++ b/test/unittest/ustack/tst.ufunc-pid0.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -9,9 +9,10 @@ #pragma D option quiet -tick-1 +profile-100ms /pid == $target/ { ufunc(ucaller); + printf("\n"); exit(0); } diff --git a/test/unittest/ustack/tst.ufunc-pid0.r.p b/test/unittest/ustack/tst.ufunc-pid0.r.p index 9203dc824..78ab8e59d 100755 --- a/test/unittest/ustack/tst.ufunc-pid0.r.p +++ b/test/unittest/ustack/tst.ufunc-pid0.r.p @@ -1,4 +1,4 @@ #!/usr/bin/gawk -f -# remove trailing blanks -{ sub(" *$", ""); print } +# remove trailing blanks, use only one line +{ sub(" *$", ""); print; exit } diff --git a/test/unittest/ustack/tst.ufunc-pid0.x b/test/unittest/ustack/tst.ufunc-pid0.x new file mode 120000 index 000000000..1df0f1ccc --- /dev/null +++ b/test/unittest/ustack/tst.ufunc-pid0.x @@ -0,0 +1 @@ +skip_pid0_if_oversubscribed.x \ No newline at end of file diff --git a/test/unittest/ustack/tst.usym-pid0.d b/test/unittest/ustack/tst.usym-pid0.d index d2f5ec5de..7833ab1e2 100644 --- a/test/unittest/ustack/tst.usym-pid0.d +++ b/test/unittest/ustack/tst.usym-pid0.d @@ -1,6 +1,6 @@ /* * Oracle Linux DTrace. - * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. * Licensed under the Universal Permissive License v 1.0 as shown at * http://oss.oracle.com/licenses/upl. */ @@ -9,9 +9,10 @@ #pragma D option quiet -tick-1 +profile-100ms /pid == $target/ { usym(ucaller); + printf("\n"); exit(0); } diff --git a/test/unittest/ustack/tst.usym-pid0.r.p b/test/unittest/ustack/tst.usym-pid0.r.p index 9203dc824..78ab8e59d 100755 --- a/test/unittest/ustack/tst.usym-pid0.r.p +++ b/test/unittest/ustack/tst.usym-pid0.r.p @@ -1,4 +1,4 @@ #!/usr/bin/gawk -f -# remove trailing blanks -{ sub(" *$", ""); print } +# remove trailing blanks, use only one line +{ sub(" *$", ""); print; exit } diff --git a/test/unittest/ustack/tst.usym-pid0.x b/test/unittest/ustack/tst.usym-pid0.x new file mode 120000 index 000000000..1df0f1ccc --- /dev/null +++ b/test/unittest/ustack/tst.usym-pid0.x @@ -0,0 +1 @@ +skip_pid0_if_oversubscribed.x \ No newline at end of file -- 2.43.5 From kris.van.hees at oracle.com Tue Apr 22 15:28:43 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Tue, 22 Apr 2025 11:28:43 -0400 Subject: [DTrace-devel] [RELEASE] DTrace 2.0.2 Message-ID: We are happy to announce the availability of DTrace for Linux 2.0.2! This new version is based on BPF and other Linux kernel tracing features and is implemented entirely as a userspace application. It can be used for tracing on any Linux kernel that provides BPF based tracing and BTF type data, although (as mentioned below) improved functionality depends on two (optional) kernel patches. The functionality is close to being feature-complete in comparison with the kernel module based version of DTrace for Linux (version 1.2.1-1). Development continues in an incremental fashion to make the full feature set of DTrace available using existing kernel features. WHERE TO FIND IT? The new version of DTrace for Linux is available at: https://github.com/oracle/dtrace-utils/tree/2.0-branch-dev The main development branch for DTrace for Linux is at: https://github.com/oracle/dtrace-utils/tree/devel The most recent release tag is 2.0.2. FEATURES - Providers: + cpc: CPU Performacne Counter probes + dtrace: BEGIN, END, and ERROR probes + fbt: Function Boundary Tracing (FBT) probes (Using fentry/fexit probes where available) + lockstat: Locking realted probes + pid: Userspace function boundary tracing and offset-based instruction probes + proc: Process lifecycle related probes + profile: Timer-based profile-* and tick-* probes + rawtp: SDT-style probes for kernel tracepoints with access to raw (untranslated) tracepoint arguments + sched: CPU scheduling probes [partial implementation] + sdt: Statically Defined Tracing (SDT) probes for kernel tracepoints + sycall: System call entry and exit probes + usdt: Userspace Statically Defined Tracing (USDT) probes + [NEW] fbt: return probes based on fexit report return value + [NEW] usdt: probes can now be discovered after tracing started (if wildcards are used in probe specifications or with the -Z option) + [NEW] usdt: typed arguments are now supported, incl. translated types and argument mapping + [NEW] rawfbt: Function BOundary Tracing style provider that always uses kprobes - it can be used to trace . symbols that are generated by compiler optimizations - Aggregations: + Regular and indexed aggregations + Aggregation functions: avg, count, llquantize, lquantize, max, min, quantize, stddev, and sum. + Aggregation actions: clear, normalize, normalize, printa - Speculative tracing: + Functions: speculation, speculate, commit, and discard - Variables: + Global variables + Thread-Local Storage (TLS) variables + Clause-local variables + Associative arrays for global and TLS variables + Full support for NULL-strings + Built-in: arg0 - arg9, args[], caller, curcpu, curthread, epid, errno, execname, gid, id, pid, ppid, probefunc, probemod, probename, probeprov, stackdepth, tid, timestamp, ucaller, uid, uregs[], ustackdepth, walltimestamp - Actions: + exit, freopen, ftruncate, mod, print, printa, printf, raise, setopt, stack, sym, system, trace, tracemem, uaddr, umod, ustack, usym + [NEW] print: enhanced with argument type info - Subroutines: + alloca, basename, bcopy, cleanpath, copyin, copyinstr, copyinto, copyout, copyoutstr, dirname, d_path [dummy], getmajor, getminor, htonl, htonll, htons, index, inet_ntoa, link_ntop, lltostr, mutex_owned, mutex_owner, mutex_type_adaptive, mutex_type_spin, ntohl, ntohll, ntohs, progenyof, rand, rindex, rw_iswriter, rw_read_held, rw_write_held, strchr, strjoin, strlen, strrchr, strstr, strtok, substr - Runtime features: + Reporting of drop-counters for trace data that could not be recorded for the principal buffer, aggregation buffers, and speculation buffers. + Pre-generated translator files to support kernels from 5.2 to current. - BPF support: + Direct compilation of D source code into BPF programs. + Efficient use of pre-compiled BPF functions for library functions. + A bpflog option to request the BPF verifier log for loaded programs. + BPF program linking of dynamically generated code and pre-compiled code to facilitate code sharing and code re-use. + Improved integrated disassembler for generated BPF code at the clause and program level (-S in combination with the new -xdisasm=# option). + Improved trace data buffer handling based on memory mapped perf event ring-buffers. + BTF type data support. - Development and debugging: + Support to run dtrace under valgrind. + Configure script based building is supported. + Improved support for building and using DTrace on upstream kernels. + [NEW] Installation paths for all components are configurable. + [NEW] Header files for USDT (sdt.h, etc) have been moved to avoid conflicts with projects that supply files with the same name. DEPENDENCIES DTrace for Linux depends on libctf (part of newer binutils) -or- libdtrace-ctf. While libctf is preferred, building against libdtrace-ctf is still possible. It can be found at: https://github.com/oracle/libdtrace-ctf DTrace for Linux makes use of BPF library functions that are compiled at build time. It depends on BPF support in GCC and binutils to generate the pre-compiled BPF function library. DTrace for Linux benefits from 2 optional kernel features that are not commonly available in Linux kernels: - CTF data generation at compile time: this provides important datatype information for kernel and kernel module symbols. - Module symbol address range data: this adds address range data about any built-in modules to allow for consistent ways to refer to probes by module and function (or probe) name. DTrace for Linux can be used for tracing without these patches, albeit with some limitations. These additional support features for tracing are available at: https://github.com/oracle/dtrace-linux-kernel/tree/v2/6.7 Please consider joining our development list: dtrace at lists.linux.dev and/or our IRC channel: #linux-dtrace at libera.chat Enjoy! From kris.van.hees at oracle.com Thu Apr 24 15:32:24 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Thu, 24 Apr 2025 11:32:24 -0400 Subject: [DTrace-devel] [PATCH v3 2/2] Clean up sched provider trampoline FIXMEs In-Reply-To: References: <20250403050252.15239-1-eugene.loh@oracle.com> Message-ID: On Tue, Apr 15, 2025 at 01:11:57PM -0400, Eugene Loh wrote: > On 4/15/25 07:59, Kris Van Hees wrote: > > > On Fri, Apr 11, 2025 at 05:20:00PM -0400, Eugene Loh wrote: > > > On 4/11/25 16:48, Kris Van Hees wrote: > > > > > > > Partial comments below (still looking at the provider changes)... > > > > > > > > On Thu, Apr 03, 2025 at 01:02:52AM -0400, eugene.loh at oracle.com wrote: > > > > > From: Eugene Loh > > > > > > > > > > The sched provider trampoline for enqueue and dequeue probes had > > > > > pending FIXMEs for providing a cpuinfo_t* for the cpu associated > > > > > with the run queue. Implement the missing code. > > > > > > > > > > Since the cpu associated with the run queue might be different from > > > > > the cpu where we are running, it becomes necessary to access the > > > > > cpuinfo for some random cpu. With Linux 5.18, there is a BPF > > > > > helper function map_lookup_percpu_elem() that allows such lookups > > > > > on per-cpu arrays. To support older kernels, however, we change > > > > > the cpuinfo BPF map from per-cpu to global. Also, it is a hash > > > > > table rather than an array in case cpus are not numbered consecutively. > > > > I agree with all the above. Good solution. > > > > > > > > > Signed-off-by: Eugene Loh > > > > > --- > > > > > bpf/get_agg.c | 2 +- > > > > > bpf/get_bvar.c | 2 +- > > > > > libdtrace/dt_bpf.c | 34 ++++++-------- > > > > > libdtrace/dt_cg.c | 5 ++- > > > > > libdtrace/dt_prov_lockstat.c | 4 +- > > > > > libdtrace/dt_prov_sched.c | 74 +++++++++++++++++++++++++------ > > > > > libdtrace/dt_work.c | 20 +++------ > > > > > test/unittest/sched/tst.enqueue.d | 1 - > > > > > 8 files changed, 89 insertions(+), 53 deletions(-) > > > > > > > > > > diff --git a/bpf/get_agg.c b/bpf/get_agg.c > > > > > index c0eb825f0..e70caa6ef 100644 > > > > > --- a/bpf/get_agg.c > > > > > +++ b/bpf/get_agg.c > > > > > @@ -21,7 +21,7 @@ extern struct bpf_map_def cpuinfo; > > > > > */ > > > > > noinline uint64_t *dt_no_agg(void) > > > > > { > > > > > - uint32_t key = 0; > > > > > + uint32_t key = bpf_get_smp_processor_id(); > > > > > dt_bpf_cpuinfo_t *ci; > > > > > ci = bpf_map_lookup_elem(&cpuinfo, &key); > > > > > diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c > > > > > index d372b3445..d81c3605f 100644 > > > > > --- a/bpf/get_bvar.c > > > > > +++ b/bpf/get_bvar.c > > > > > @@ -67,7 +67,7 @@ noinline uint64_t dt_bvar_caller(const dt_dctx_t *dctx) > > > > > noinline uint64_t dt_bvar_curcpu(const dt_dctx_t *dctx) > > > > > { > > > > > - uint32_t key = 0; > > > > > + uint32_t key = bpf_get_smp_processor_id(); > > > > > void *val = bpf_map_lookup_elem(&cpuinfo, &key); > > > > > if (val == NULL) { > > > > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c > > > > > index 6d42a96c7..d6722cbd1 100644 > > > > > --- a/libdtrace/dt_bpf.c > > > > > +++ b/libdtrace/dt_bpf.c > > > > > @@ -761,37 +761,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp) > > > > > static int > > > > > gmap_create_cpuinfo(dtrace_hdl_t *dtp) > > > > > { > > > > > - int i, rc; > > > > > + int i; > > > > > uint32_t key = 0; > > > > > dtrace_conf_t *conf = &dtp->dt_conf; > > > > > size_t ncpus = conf->num_online_cpus; > > > > > - dt_bpf_cpuinfo_t *data; > > > > > + dt_bpf_cpuinfo_t data; > > > > Not sure about this, because (see below)... > > > > > > > > > cpuinfo_t *ci; > > > > > - /* > > > > > - * num_possible_cpus <= num_online_cpus: see dt_conf_init. > > > > > - */ > > > > > - data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > > > > > - sizeof(dt_bpf_cpuinfo_t)); > > > > > - if (data == NULL) > > > > > - return dt_set_errno(dtp, EDT_NOMEM); > > > > > - > > > > > - for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) > > > > > - memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t)); > > > > > - > > > > > dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo", > > > > > - BPF_MAP_TYPE_PERCPU_ARRAY, > > > > > + BPF_MAP_TYPE_HASH, > > > > > sizeof(uint32_t), > > > > > - sizeof(dt_bpf_cpuinfo_t), 1); > > > > > + sizeof(dt_bpf_cpuinfo_t), ncpus); > > > > > if (dtp->dt_cpumap_fd == -1) > > > > > return -1; > > > > > - rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); > > > > > - dt_free(dtp, data); > > > > > - if (rc == -1) > > > > > - return dt_bpf_error(dtp, > > > > > - "cannot update BPF map 'cpuinfo': %s\n", > > > > > - strerror(errno)); > > > > > + memset(&data, 0, sizeof(data)); > > > > Do we need this, because (see below).... > > > > > > > > > + for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) { > > > > > + memcpy(&data.ci, ci, sizeof(cpuinfo_t)); > > > > Do we need this, because (see below).... > > > > > > > > > + key = ci->cpu_id; > > > > > + if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, &data) == -1) > > > > Why can'you we simply do: > > > > > > > > if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, ci) == -1) > > > I think the problem is that the BPF map has elements with size > > > sizeof(dt_bpf_cpuinfo_t).? Meanwhile, ci has size sizeof(cpuinfo_t), which > > > is smaller.? So if we do an update like that, the map will have stuff where > > > we want it to be initialized to 0. > > Yes, but I am 99% certain that BPF maps are allocated and initialized with > > zeros because doing otherwise would be a major security risk for the kernel. > > So you can count on that (should verify first to make certain but honestly > > it needs to be or else it could leak data which is a big no-no). > > I don't think that helps.? We do not feed map_update() a size.? We cannot > say, "Fill in only the first few bytes of the element."? We just point to > where the new value is and the BPF function copies as much data in as it > needs for the map element.? Unless we prepare a zero-padded copy of the > data, map_update() will overreach and copy in values that should be zero but > are actually data corresponding to other CPUs. Ah yes, you are right of course. Nevermind. > > > > > > + return dt_bpf_error(dtp, > > > > > + "cannot update BPF map 'cpuinfo': %s\n", > > > > > + strerror(errno)); > > > > > + } > > > > > return 0; > > > > > } > > > > > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c > > > > > index 6dcf4cd3d..d83b1c2ce 100644 > > > > > --- a/libdtrace/dt_cg.c > > > > > +++ b/libdtrace/dt_cg.c > > > > > @@ -1243,9 +1243,12 @@ dt_cg_epilogue(dt_pcb_t *pcb) > > > > > } else { > > > > > idp = dt_dlib_get_map(dtp, "cpuinfo"); > > > > > assert(idp != NULL); > > > > > + > > > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > > > > > + > > > > > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > > > > > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP)); > > > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > > > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > > > > > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > > > > > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl)); > > > > > emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1)); > > > > > diff --git a/libdtrace/dt_prov_lockstat.c b/libdtrace/dt_prov_lockstat.c > > > > > index c73edf9be..8b2cf4da2 100644 > > > > > --- a/libdtrace/dt_prov_lockstat.c > > > > > +++ b/libdtrace/dt_prov_lockstat.c > > > > > @@ -121,11 +121,13 @@ static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > > > > > { > > > > > dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > > > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > > > > > + > > > > > assert(idp != NULL); > > > > > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > > > > > emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > > > > > emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_BASE)); > > > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > > > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > > > > > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > > > > > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > > > > > emit(dlp, BPF_MOV_REG(BPF_REG_6, BPF_REG_0)); > > > > > diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c > > > > > index 3a218f3cb..a548e679f 100644 > > > > > --- a/libdtrace/dt_prov_sched.c > > > > > +++ b/libdtrace/dt_prov_sched.c > > > > > @@ -84,6 +84,40 @@ static int populate(dtrace_hdl_t *dtp) > > > > > probe_args, probes); > > > > > } > > > > > +/* > > > > > + * Get a pointer to the cpuinfo_t structure for the CPU associated > > > > > + * with the runqueue that is in arg0. > > > > > + * > > > > > + * Clobbers %r1 through %r5 > > > > > + * Stores pointer to cpuinfo_t struct in %r0 > > > > > + */ > > > > > +static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > > > > > +{ > > > > > + dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > > > > > + > > > > > + assert(idp != NULL); > > > > > + > > > > > + /* Put the runqueue pointer from mst->arg0 into %r3. */ > > > > > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0))); > > > > > + > > > > > + /* Turn it into a pointer to its cpu member. */ > > > > > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, dt_cg_ctf_offsetof("struct rq", "cpu", NULL, 1))); > > > > > + > > > > > + /* Call bpf_probe_read_kernel(%fp + DT_TRAMP_SP_SLOT[0], sizeof(int), %r3) */ > > > > > + emit(dlp, BPF_MOV_IMM(BPF_REG_2, (int) sizeof(int))); > > > > > + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP)); > > > > > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0))); > > > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_kernel)); > > > > > + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, exitlbl)); > > > > > + > > > > > + /* Now look up the corresponding cpuinfo_t. */ > > > > > + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > > > > > + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > > > > > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0))); > > > > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > > > > > + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > > > > > +} > > > > > + > > > > > /* > > > > > * Generate a BPF trampoline for a SDT probe. > > > > > * > > > > > @@ -98,18 +132,39 @@ static int populate(dtrace_hdl_t *dtp) > > > > > */ > > > > > static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > > > > > { > > > > > + dtrace_hdl_t *dtp = pcb->pcb_hdl; > > > > > dt_irlist_t *dlp = &pcb->pcb_ir; > > > > > dt_probe_t *prp = pcb->pcb_probe; > > > > > if (strcmp(prp->desc->prb, "dequeue") == 0) { > > > > > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > > > > > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > > > > > /* > > > > > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > > > > > - * associated with the runqueue. > > > > > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > > > > > + */ > > > > > + get_cpuinfo(dtp, dlp, exitlbl); > > > > > + > > > > > + /* > > > > > + * Copy arg1 into arg0. > > > > > */ > > > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > > > > > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > > > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > > > > > + > > > > > + /* Store the cpuinfo_t* in %r0 into arg1. */ > > > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > > > > > } else if (strcmp(prp->desc->prb, "enqueue") == 0) { > > > > > + /* > > > > > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > > > > > + */ > > > > > + get_cpuinfo(dtp, dlp, exitlbl); > > > > > + > > > > > + /* > > > > > + * Copy arg1 into arg0. > > > > > + */ > > > > > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > > > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > > > > > + > > > > > + /* Store the cpuinfo_t* in %r0 into arg1. */ > > > > > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > > > > > + > > > > > /* > > > > > * This is ugly but necessary... enqueue_task() takes a flags argument and the > > > > > * ENQUEUE_HEAD flag is used to indicate that the task is to be placed at the > > > > > @@ -120,15 +175,6 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > > > > > * outside the kernel source tree. > > > > > */ > > > > > #define ENQUEUE_HEAD 0x10 > > > > > - > > > > > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > > > > > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > > > > > - /* > > > > > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > > > > > - * associated with the runqueue. > > > > > - */ > > > > > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > > > > > - > > > > > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2))); > > > > > emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, ENQUEUE_HEAD)); > > > > > emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0)); > > > > > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c > > > > > index 498d5332a..2167ed299 100644 > > > > > --- a/libdtrace/dt_work.c > > > > > +++ b/libdtrace/dt_work.c > > > > > @@ -37,35 +37,29 @@ END_probe(void) > > > > > int > > > > > dt_check_cpudrops(dtrace_hdl_t *dtp, processorid_t cpu, dtrace_dropkind_t what) > > > > > { > > > > > - dt_bpf_cpuinfo_t *ci; > > > > > - uint32_t cikey = 0; > > > > > + dt_bpf_cpuinfo_t ci; > > > > > + uint32_t cikey = cpu; > > > > > uint64_t cnt; > > > > > int rval = 0; > > > > > assert(what == DTRACEDROP_PRINCIPAL || what == DTRACEDROP_AGGREGATION); > > > > > - ci = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > > > > > - sizeof(dt_bpf_cpuinfo_t)); > > > > > - if (ci == NULL) > > > > > - return dt_set_errno(dtp, EDT_NOMEM); > > > > > - > > > > > - if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, ci) == -1) { > > > > > + if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, &ci) == -1) { > > > > > rval = dt_set_errno(dtp, EDT_BPF); > > > > > goto fail; > > > > > } > > > > > if (what == DTRACEDROP_PRINCIPAL) { > > > > > - cnt = ci[cpu].buf_drops - dtp->dt_drops[cpu].buf; > > > > > - dtp->dt_drops[cpu].buf = ci[cpu].buf_drops; > > > > > + cnt = ci.buf_drops - dtp->dt_drops[cpu].buf; > > > > > + dtp->dt_drops[cpu].buf = ci.buf_drops; > > > > > } else { > > > > > - cnt = ci[cpu].agg_drops - dtp->dt_drops[cpu].agg; > > > > > - dtp->dt_drops[cpu].agg = ci[cpu].agg_drops; > > > > > + cnt = ci.agg_drops - dtp->dt_drops[cpu].agg; > > > > > + dtp->dt_drops[cpu].agg = ci.agg_drops; > > > > > } > > > > > rval = dt_handle_cpudrop(dtp, cpu, what, cnt); > > > > > fail: > > > > > - dt_free(dtp, ci); > > > > > return rval; > > > > > } > > > > > diff --git a/test/unittest/sched/tst.enqueue.d b/test/unittest/sched/tst.enqueue.d > > > > > index f445ac843..28dcace8c 100644 > > > > > --- a/test/unittest/sched/tst.enqueue.d > > > > > +++ b/test/unittest/sched/tst.enqueue.d > > > > > @@ -4,7 +4,6 @@ > > > > > * Licensed under the Universal Permissive License v 1.0 as shown at > > > > > * http://oss.oracle.com/licenses/upl. > > > > > */ > > > > > -/* @@xfail: dtv2 */ > > > > > #pragma D option switchrate=100hz > > > > > #pragma D option destructive > > > > > -- > > > > > 2.43.5 > > > > > From kris.van.hees at oracle.com Thu Apr 24 15:34:45 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Thu, 24 Apr 2025 11:34:45 -0400 Subject: [DTrace-devel] [PATCH v3 2/2] Clean up sched provider trampoline FIXMEs In-Reply-To: <20250403050252.15239-1-eugene.loh@oracle.com> References: <20250403050252.15239-1-eugene.loh@oracle.com> Message-ID: Reviewed-by: Kris Van Hees On Thu, Apr 03, 2025 at 01:02:52AM -0400, eugene.loh at oracle.com wrote: > From: Eugene Loh > > The sched provider trampoline for enqueue and dequeue probes had > pending FIXMEs for providing a cpuinfo_t* for the cpu associated > with the run queue. Implement the missing code. > > Since the cpu associated with the run queue might be different from > the cpu where we are running, it becomes necessary to access the > cpuinfo for some random cpu. With Linux 5.18, there is a BPF > helper function map_lookup_percpu_elem() that allows such lookups > on per-cpu arrays. To support older kernels, however, we change > the cpuinfo BPF map from per-cpu to global. Also, it is a hash > table rather than an array in case cpus are not numbered consecutively. > > Signed-off-by: Eugene Loh > --- > bpf/get_agg.c | 2 +- > bpf/get_bvar.c | 2 +- > libdtrace/dt_bpf.c | 34 ++++++-------- > libdtrace/dt_cg.c | 5 ++- > libdtrace/dt_prov_lockstat.c | 4 +- > libdtrace/dt_prov_sched.c | 74 +++++++++++++++++++++++++------ > libdtrace/dt_work.c | 20 +++------ > test/unittest/sched/tst.enqueue.d | 1 - > 8 files changed, 89 insertions(+), 53 deletions(-) > > diff --git a/bpf/get_agg.c b/bpf/get_agg.c > index c0eb825f0..e70caa6ef 100644 > --- a/bpf/get_agg.c > +++ b/bpf/get_agg.c > @@ -21,7 +21,7 @@ extern struct bpf_map_def cpuinfo; > */ > noinline uint64_t *dt_no_agg(void) > { > - uint32_t key = 0; > + uint32_t key = bpf_get_smp_processor_id(); > dt_bpf_cpuinfo_t *ci; > > ci = bpf_map_lookup_elem(&cpuinfo, &key); > diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c > index d372b3445..d81c3605f 100644 > --- a/bpf/get_bvar.c > +++ b/bpf/get_bvar.c > @@ -67,7 +67,7 @@ noinline uint64_t dt_bvar_caller(const dt_dctx_t *dctx) > > noinline uint64_t dt_bvar_curcpu(const dt_dctx_t *dctx) > { > - uint32_t key = 0; > + uint32_t key = bpf_get_smp_processor_id(); > void *val = bpf_map_lookup_elem(&cpuinfo, &key); > > if (val == NULL) { > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c > index 6d42a96c7..d6722cbd1 100644 > --- a/libdtrace/dt_bpf.c > +++ b/libdtrace/dt_bpf.c > @@ -761,37 +761,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp) > static int > gmap_create_cpuinfo(dtrace_hdl_t *dtp) > { > - int i, rc; > + int i; > uint32_t key = 0; > dtrace_conf_t *conf = &dtp->dt_conf; > size_t ncpus = conf->num_online_cpus; > - dt_bpf_cpuinfo_t *data; > + dt_bpf_cpuinfo_t data; > cpuinfo_t *ci; > > - /* > - * num_possible_cpus <= num_online_cpus: see dt_conf_init. > - */ > - data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > - sizeof(dt_bpf_cpuinfo_t)); > - if (data == NULL) > - return dt_set_errno(dtp, EDT_NOMEM); > - > - for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) > - memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t)); > - > dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo", > - BPF_MAP_TYPE_PERCPU_ARRAY, > + BPF_MAP_TYPE_HASH, > sizeof(uint32_t), > - sizeof(dt_bpf_cpuinfo_t), 1); > + sizeof(dt_bpf_cpuinfo_t), ncpus); > if (dtp->dt_cpumap_fd == -1) > return -1; > > - rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data); > - dt_free(dtp, data); > - if (rc == -1) > - return dt_bpf_error(dtp, > - "cannot update BPF map 'cpuinfo': %s\n", > - strerror(errno)); > + memset(&data, 0, sizeof(data)); > + for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++) { > + memcpy(&data.ci, ci, sizeof(cpuinfo_t)); > + key = ci->cpu_id; > + if (dt_bpf_map_update(dtp->dt_cpumap_fd, &key, &data) == -1) > + return dt_bpf_error(dtp, > + "cannot update BPF map 'cpuinfo': %s\n", > + strerror(errno)); > + } > > return 0; > } > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c > index 6dcf4cd3d..d83b1c2ce 100644 > --- a/libdtrace/dt_cg.c > +++ b/libdtrace/dt_cg.c > @@ -1243,9 +1243,12 @@ dt_cg_epilogue(dt_pcb_t *pcb) > } else { > idp = dt_dlib_get_map(dtp, "cpuinfo"); > assert(idp != NULL); > + > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > + > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP)); > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl)); > emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1)); > diff --git a/libdtrace/dt_prov_lockstat.c b/libdtrace/dt_prov_lockstat.c > index c73edf9be..8b2cf4da2 100644 > --- a/libdtrace/dt_prov_lockstat.c > +++ b/libdtrace/dt_prov_lockstat.c > @@ -121,11 +121,13 @@ static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > { > dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_smp_processor_id)); > + > assert(idp != NULL); > dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_BASE)); > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0)); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_2, 0, BPF_REG_0)); > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > emit(dlp, BPF_MOV_REG(BPF_REG_6, BPF_REG_0)); > diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c > index 3a218f3cb..a548e679f 100644 > --- a/libdtrace/dt_prov_sched.c > +++ b/libdtrace/dt_prov_sched.c > @@ -84,6 +84,40 @@ static int populate(dtrace_hdl_t *dtp) > probe_args, probes); > } > > +/* > + * Get a pointer to the cpuinfo_t structure for the CPU associated > + * with the runqueue that is in arg0. > + * > + * Clobbers %r1 through %r5 > + * Stores pointer to cpuinfo_t struct in %r0 > + */ > +static void get_cpuinfo(dtrace_hdl_t *dtp, dt_irlist_t *dlp, uint_t exitlbl) > +{ > + dt_ident_t *idp = dt_dlib_get_map(dtp, "cpuinfo"); > + > + assert(idp != NULL); > + > + /* Put the runqueue pointer from mst->arg0 into %r3. */ > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0))); > + > + /* Turn it into a pointer to its cpu member. */ > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, dt_cg_ctf_offsetof("struct rq", "cpu", NULL, 1))); > + > + /* Call bpf_probe_read_kernel(%fp + DT_TRAMP_SP_SLOT[0], sizeof(int), %r3) */ > + emit(dlp, BPF_MOV_IMM(BPF_REG_2, (int) sizeof(int))); > + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP)); > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(0))); > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_kernel)); > + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, exitlbl)); > + > + /* Now look up the corresponding cpuinfo_t. */ > + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id); > + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_FP)); > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0))); > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem)); > + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, exitlbl)); > +} > + > /* > * Generate a BPF trampoline for a SDT probe. > * > @@ -98,18 +132,39 @@ static int populate(dtrace_hdl_t *dtp) > */ > static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > { > + dtrace_hdl_t *dtp = pcb->pcb_hdl; > dt_irlist_t *dlp = &pcb->pcb_ir; > dt_probe_t *prp = pcb->pcb_probe; > > if (strcmp(prp->desc->prb, "dequeue") == 0) { > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > /* > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > - * associated with the runqueue. > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > + */ > + get_cpuinfo(dtp, dlp, exitlbl); > + > + /* > + * Copy arg1 into arg0. > */ > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > + > + /* Store the cpuinfo_t* in %r0 into arg1. */ > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > } else if (strcmp(prp->desc->prb, "enqueue") == 0) { > + /* > + * Get the runqueue from arg0 and place its cpuinfo_t* into %r0. > + */ > + get_cpuinfo(dtp, dlp, exitlbl); > + > + /* > + * Copy arg1 into arg0. > + */ > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1))); > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_3)); > + > + /* Store the cpuinfo_t* in %r0 into arg1. */ > + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0)); > + > /* > * This is ugly but necessary... enqueue_task() takes a flags argument and the > * ENQUEUE_HEAD flag is used to indicate that the task is to be placed at the > @@ -120,15 +175,6 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > * outside the kernel source tree. > */ > #define ENQUEUE_HEAD 0x10 > - > - emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(1))); > - emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0)); > - /* > - * FIXME: arg1 should be a pointer to cpuinfo_t for the CPU > - * associated with the runqueue. > - */ > - emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(1), 0)); > - > emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2))); > emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, ENQUEUE_HEAD)); > emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0)); > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c > index 498d5332a..2167ed299 100644 > --- a/libdtrace/dt_work.c > +++ b/libdtrace/dt_work.c > @@ -37,35 +37,29 @@ END_probe(void) > int > dt_check_cpudrops(dtrace_hdl_t *dtp, processorid_t cpu, dtrace_dropkind_t what) > { > - dt_bpf_cpuinfo_t *ci; > - uint32_t cikey = 0; > + dt_bpf_cpuinfo_t ci; > + uint32_t cikey = cpu; > uint64_t cnt; > int rval = 0; > > assert(what == DTRACEDROP_PRINCIPAL || what == DTRACEDROP_AGGREGATION); > > - ci = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus, > - sizeof(dt_bpf_cpuinfo_t)); > - if (ci == NULL) > - return dt_set_errno(dtp, EDT_NOMEM); > - > - if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, ci) == -1) { > + if (dt_bpf_map_lookup(dtp->dt_cpumap_fd, &cikey, &ci) == -1) { > rval = dt_set_errno(dtp, EDT_BPF); > goto fail; > } > > if (what == DTRACEDROP_PRINCIPAL) { > - cnt = ci[cpu].buf_drops - dtp->dt_drops[cpu].buf; > - dtp->dt_drops[cpu].buf = ci[cpu].buf_drops; > + cnt = ci.buf_drops - dtp->dt_drops[cpu].buf; > + dtp->dt_drops[cpu].buf = ci.buf_drops; > } else { > - cnt = ci[cpu].agg_drops - dtp->dt_drops[cpu].agg; > - dtp->dt_drops[cpu].agg = ci[cpu].agg_drops; > + cnt = ci.agg_drops - dtp->dt_drops[cpu].agg; > + dtp->dt_drops[cpu].agg = ci.agg_drops; > } > > rval = dt_handle_cpudrop(dtp, cpu, what, cnt); > > fail: > - dt_free(dtp, ci); > return rval; > } > > diff --git a/test/unittest/sched/tst.enqueue.d b/test/unittest/sched/tst.enqueue.d > index f445ac843..28dcace8c 100644 > --- a/test/unittest/sched/tst.enqueue.d > +++ b/test/unittest/sched/tst.enqueue.d > @@ -4,7 +4,6 @@ > * Licensed under the Universal Permissive License v 1.0 as shown at > * http://oss.oracle.com/licenses/upl. > */ > -/* @@xfail: dtv2 */ > > #pragma D option switchrate=100hz > #pragma D option destructive > -- > 2.43.5 > From kris.van.hees at oracle.com Thu Apr 24 15:37:58 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Thu, 24 Apr 2025 11:37:58 -0400 Subject: [DTrace-devel] [PATCH] test: Get cpc expected branches and instructions counts from perf In-Reply-To: <20250406051917.29640-1-eugene.loh@oracle.com> References: <20250406051917.29640-1-eugene.loh@oracle.com> Message-ID: Reviewed-by: Kris Van Hees On Sun, Apr 06, 2025 at 01:19:17AM -0400, eugene.loh at oracle.com wrote: > From: Eugene Loh > > For a number of the cpc tests, we get expected counts from perf. > For branches and instructions, however, we can determine the > expected counts more directly since there is one branch and a > fixed number of instructions per iteration. Thus, we can derive > an expected cpc counts simply by knowing the number of iterations. > > For some compilers, however, there is apparently some loop unrolling > even at low, default levels of optimization. > > So, revert to the perf count to estimate the expected cpc count > even for the branches and instructions tests. > > Signed-off-by: Eugene Loh > --- > test/unittest/cpc/tst.branches.sh | 2 +- > test/unittest/cpc/tst.instructions.sh | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/test/unittest/cpc/tst.branches.sh b/test/unittest/cpc/tst.branches.sh > index 87250c371..442d65332 100755 > --- a/test/unittest/cpc/tst.branches.sh > +++ b/test/unittest/cpc/tst.branches.sh > @@ -53,7 +53,7 @@ fi > actual=$(($period * `cat tmp.txt`)) > > # determine expected count (one branch per interation) > -expect=$niters > +expect=`$utils/perf_count_event.sh branches workload_user $niters` > > # check > $utils/check_result.sh $actual $expect $(($expect / 4)) > diff --git a/test/unittest/cpc/tst.instructions.sh b/test/unittest/cpc/tst.instructions.sh > index a7fad3e78..34112c6e7 100755 > --- a/test/unittest/cpc/tst.instructions.sh > +++ b/test/unittest/cpc/tst.instructions.sh > @@ -61,7 +61,7 @@ fi > actual=$(($period * `cat tmp.txt`)) > > # determine expected count > -expect=$(($niters * $ninstructions_per_iter)) > +expect=`$utils/perf_count_event.sh instructions workload_user $niters` > > # check > $utils/check_result.sh $actual $expect $(($expect / 4)) > -- > 2.43.5 > From kris.van.hees at oracle.com Thu Apr 24 17:27:48 2025 From: kris.van.hees at oracle.com (Kris Van Hees) Date: Thu, 24 Apr 2025 13:27:48 -0400 Subject: [DTrace-devel] [PATCH v3] Allow arbitrary tracefs mount points In-Reply-To: References: <20241112134256.118717-1-nick.alcock@oracle.com> Message-ID: Nick, Can you rework this patch based on current dev? The changes to rework fbt and rawfbt providers affects this patch quite a bit. Kris On Tue, Feb 25, 2025 at 04:08:23PM -0500, Kris Van Hees wrote: > ping? > > On Fri, Nov 15, 2024 at 03:42:49PM -0500, Kris Van Hees wrote: > > On Tue, Nov 12, 2024 at 01:42:56PM +0000, Nick Alcock wrote: > > > So as of kernel 6.3 (upstream commit 2455f0e124d317dd08d337a75), the > > > canonical tracefs location has moved to /sys/kernel/tracing. Unfortunately > > > this requires an explicit mount, and a great many installations have not > > > done this and perhaps never will. So if the debugfs is mounted on > > > /sys/kernel/debug, it automatically makes /sys/kernel/debug/tracing appear > > > as it used to, as another tracefs mount. And of course there's nothing > > > special about the "canonical location": it's up to the admin, who might > > > choose to remount the thing anywhere at all. > > > > > > To make this even more fun, it's quite possible to end up with the tracefs > > > on /sys/kernel/debug/tracing, but an empty directory at /sys/kernel/tracing > > > (I got that during testing with no effort at all). > > > > > > All this means that the existing DTrace hardwiring for tracefs/eventsfs > > > locations isn't good enough. Instead, hunt for a suitable tracefs mount with > > > getmntent(), and add a function to open files under that directory, allowing > > > the path to be created using a printf-style format string (mimicking the > > > things we used to do with EVENTSFS defines and the like). This is actually > > > all we need; there is no need to ever return these paths at all, so there > > > is no clogging up the code with free()s -- and actually there's a > > > noticeable simplification in most cases. > > > > > > Tested with both in-practice-observed locations of debugfs, and the > > > obviously crazy and bad-in-a-format-string path of "/%s/%s/%n" to make sure > > > that is properly rejected. > > > > I think it is useful to put the testing (and results) in the commit msg for > > this, especially since you do not have a testcase for this (and I can > > understand why - though if we were to use chroot or something a test could > > be constructed, right). And the case of tracefs *not* being in standard > > location is worth pointing out to clearly show that this does work in the > > most generic case. > > > > Or if output of testing is too much to include (I assume it shouldn't be), > > at least make sure you test standard and non-standard locations, and then > > confidently report here that it works for all locations that do not have > > % in the path name. > > > > > Bug: https://github.com/oracle/dtrace-utils/issues/111 > > > Signed-off-by: Nick Alcock > > > --- > > > include/tracefs.h | 14 ---- > > > libdtrace/dt_error.c | 3 +- > > > libdtrace/dt_impl.h | 3 + > > > libdtrace/dt_open.c | 1 + > > > libdtrace/dt_prov_dtrace.c | 27 +++---- > > > libdtrace/dt_prov_fbt.c | 37 ++++----- > > > libdtrace/dt_prov_rawtp.c | 9 ++- > > > libdtrace/dt_prov_sdt.c | 30 ++++--- > > > libdtrace/dt_prov_syscall.c | 27 ++++--- > > > libdtrace/dt_prov_uprobe.c | 41 +++++----- > > > libdtrace/dt_provider.h | 1 - > > > libdtrace/dt_subr.c | 80 +++++++++++++++++++ > > > runtest.sh | 8 ++ > > > test/unittest/funcs/tst.rw_.x | 7 +- > > > test/unittest/providers/tst.dtrace_cleanup.sh | 9 ++- > > > test/utils/clean_probes.sh | 12 ++- > > > 16 files changed, 203 insertions(+), 106 deletions(-) > > > delete mode 100644 include/tracefs.h > > > > > > diff --git a/include/tracefs.h b/include/tracefs.h > > > deleted file mode 100644 > > > index d671f51adefc..000000000000 > > > --- a/include/tracefs.h > > > +++ /dev/null > > > @@ -1,14 +0,0 @@ > > > -/* > > > - * Oracle Linux DTrace; simple uprobe helper functions > > > - * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. > > > - * Licensed under the Universal Permissive License v 1.0 as shown at > > > - * http://oss.oracle.com/licenses/upl. > > > - */ > > > - > > > -#ifndef _TRACEFS_H > > > -#define _TRACEFS_H > > > - > > > -#define TRACEFS "/sys/kernel/debug/tracing/" > > > -#define EVENTSFS TRACEFS "events/" > > > - > > > -#endif /* _TRACEFS_H */ > > > diff --git a/libdtrace/dt_error.c b/libdtrace/dt_error.c > > > index 213f0d9e1385..9c4a2b32888e 100644 > > > --- a/libdtrace/dt_error.c > > > +++ b/libdtrace/dt_error.c > > > @@ -98,7 +98,8 @@ static const struct { > > > { EDT_READMAXSTACK, "Cannot read kernel param perf_event_max_stack" }, > > > { EDT_TRACEMEM, "Missing or corrupt tracemem() record" }, > > > { EDT_PCAP, "Missing or corrupt pcap() record" }, > > > - { EDT_PRINT, "Missing or corrupt print() record" } > > > + { EDT_PRINT, "Missing or corrupt print() record" }, > > > + { EDT_TRACEFS, "Cannot find tracefs" } > > > }; > > > > > > static const int _dt_nerr = sizeof(_dt_errlist) / sizeof(_dt_errlist[0]); > > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h > > > index 68fb8ec53c06..950cb34819aa 100644 > > > --- a/libdtrace/dt_impl.h > > > +++ b/libdtrace/dt_impl.h > > > @@ -354,6 +354,7 @@ struct dtrace_hdl { > > > char *dt_module_path; /* pathname of kernel module root */ > > > dt_version_t dt_kernver;/* kernel version, used in the libpath */ > > > char *dt_dofstash_path; /* Path to the DOF stash. */ > > > + char *dt_tracefs_path; /* Path to tracefs. */ > > > uid_t dt_useruid; /* lowest non-system uid: set via -xuseruid */ > > > char *dt_sysslice; /* the systemd system slice: set via -xsysslice */ > > > uint_t dt_lazyload; /* boolean: set via -xlazyload */ > > > @@ -643,6 +644,7 @@ enum { > > > EDT_TRACEMEM, /* missing or corrupt tracemem() record */ > > > EDT_PCAP, /* missing or corrupt pcap() record */ > > > EDT_PRINT, /* missing or corrupt print() record */ > > > + EDT_TRACEFS, /* cannot find tracefs */ > > > }; > > > > > > /* > > > @@ -713,6 +715,7 @@ extern void dt_conf_init(dtrace_hdl_t *); > > > > > > extern int dt_gmatch(const char *, const char *); > > > extern char *dt_basename(char *); > > > +extern int dt_tracefs_open(dtrace_hdl_t *, const char *fn, int flags, ...); > > > > > > extern ulong_t dt_popc(ulong_t); > > > extern ulong_t dt_popcb(const ulong_t *, ulong_t); > > > diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c > > > index e1972aa821e7..775830f64492 100644 > > > --- a/libdtrace/dt_open.c > > > +++ b/libdtrace/dt_open.c > > > @@ -1317,6 +1317,7 @@ dtrace_close(dtrace_hdl_t *dtp) > > > free(dtp->dt_cpp_argv); > > > free(dtp->dt_cpp_path); > > > free(dtp->dt_ld_path); > > > + free(dtp->dt_tracefs_path); > > > free(dtp->dt_sysslice); > > > free(dtp->dt_dofstash_path); > > > > > > diff --git a/libdtrace/dt_prov_dtrace.c b/libdtrace/dt_prov_dtrace.c > > > index 34b5d8e2467f..670954beb4c9 100644 > > > --- a/libdtrace/dt_prov_dtrace.c > > > +++ b/libdtrace/dt_prov_dtrace.c > > > @@ -23,8 +23,6 @@ static const char funname[] = ""; > > > > > > #define PROBE_FUNC_SUFFIX "_probe" > > > > > > -#define UPROBE_EVENTS TRACEFS "uprobe_events" > > > > Since you keep KPROBE_EVENTS etc in other providers, why not here? > > > > > - > > > static const dtrace_pattr_t pattr = { > > > { DTRACE_STABILITY_STABLE, DTRACE_STABILITY_STABLE, DTRACE_CLASS_COMMON }, > > > { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN }, > > > @@ -229,11 +227,9 @@ out: > > > static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd) > > > { > > > if (!dt_tp_probe_has_info(prp)) { > > > - char *spec; > > > - char *fn; > > > - FILE *f; > > > - size_t len; > > > - int fd, rc = -1; > > > + char *spec; > > > + FILE *f; > > > + int fd = -1, rc = -1; > > > > > > /* get a uprobe specification for this probe */ > > > spec = uprobe_spec(getpid(), prp->desc->prb); > > > @@ -241,7 +237,8 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd) > > > return -ENOENT; > > > > > > /* add a uprobe */ > > > - fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND); > > > + fd = dt_tracefs_open(dtp, "uprobe_events", O_WRONLY | O_APPEND); > > > + > > > if (fd != -1) { > > > rc = dprintf(fd, "p:" GROUP_FMT "/%s %s\n", > > > GROUP_DATA, prp->desc->prb, spec); > > > @@ -252,16 +249,12 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd) > > > return -ENOENT; > > > > > > /* open format file */ > > > - len = snprintf(NULL, 0, "%s" GROUP_FMT "/%s/format", > > > - EVENTSFS, GROUP_DATA, prp->desc->prb) + 1; > > > - fn = dt_alloc(dtp, len); > > > - if (fn == NULL) > > > + fd = dt_tracefs_open(dtp, "events/" GROUP_FMT "/%s/format", > > > + O_RDONLY, GROUP_DATA, prp->desc->prb); > > > + if (fd < 0) > > > return -ENOENT; > > > > > > - snprintf(fn, len, "%s" GROUP_FMT "/%s/format", > > > - EVENTSFS, GROUP_DATA, prp->desc->prb); > > > - f = fopen(fn, "r"); > > > - dt_free(dtp, fn); > > > + f = fdopen(fd, "r"); > > > if (f == NULL) > > > return -ENOENT; > > > > > > @@ -296,7 +289,7 @@ static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp) > > > > > > dt_tp_probe_detach(dtp, prp); > > > > > > - fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND); > > > + fd = dt_tracefs_open(dtp, "uprobe_events", O_WRONLY | O_APPEND); > > > if (fd == -1) > > > return; > > > > > > diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c > > > index 21f63ddffc73..b5c1f5d22a06 100644 > > > --- a/libdtrace/dt_prov_fbt.c > > > +++ b/libdtrace/dt_prov_fbt.c > > > @@ -43,8 +43,8 @@ > > > static const char prvname[] = "fbt"; > > > static const char modname[] = "vmlinux"; > > > > > > -#define KPROBE_EVENTS TRACEFS "kprobe_events" > > > -#define PROBE_LIST TRACEFS "available_filter_functions" > > > +#define KPROBE_EVENTS "kprobe_events" > > > +#define PROBE_LIST "available_filter_functions" > > > > > > #define FBT_GROUP_FMT GROUP_FMT "_%s" > > > #define FBT_GROUP_DATA GROUP_DATA, prp->desc->prb > > > @@ -65,6 +65,7 @@ static int populate(dtrace_hdl_t *dtp) > > > { > > > dt_provider_t *prv; > > > dt_provimpl_t *impl; > > > + int fd; > > > FILE *f; > > > char *buf = NULL; > > > char *p; > > > @@ -79,7 +80,11 @@ static int populate(dtrace_hdl_t *dtp) > > > if (prv == NULL) > > > return -1; /* errno already set */ > > > > > > - f = fopen(PROBE_LIST, "r"); > > > + fd = dt_tracefs_open(dtp, PROBE_LIST, O_RDONLY); > > > + if (fd < 0) > > > + return 0; > > > + > > > + f = fdopen(fd, "r"); > > > if (f == NULL) > > > return 0; > > > > > > @@ -363,16 +368,14 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl) > > > static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd) > > > { > > > if (!dt_tp_probe_has_info(prp)) { > > > - char *fn; > > > - FILE *f; > > > - size_t len; > > > - int fd, rc = -1; > > > + FILE *f; > > > + int fd, rc = -1; > > > > > > /* > > > * Register the kprobe with the tracing subsystem. This will > > > * create a tracepoint event. > > > */ > > > - fd = open(KPROBE_EVENTS, O_WRONLY | O_APPEND); > > > + fd = dt_tracefs_open(dtp, KPROBE_EVENTS, O_WRONLY | O_APPEND); > > > if (fd == -1) > > > return -ENOENT; > > > > > > @@ -383,19 +386,13 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd) > > > if (rc == -1) > > > return -ENOENT; > > > > > > - /* create format file name */ > > > - len = snprintf(NULL, 0, "%s" FBT_GROUP_FMT "/%s/format", > > > - EVENTSFS, FBT_GROUP_DATA, prp->desc->fun) + 1; > > > - fn = dt_alloc(dtp, len); > > > - if (fn == NULL) > > > + /* open format file */ > > > + fd = dt_tracefs_open(dtp, "events/" FBT_GROUP_FMT "/%s/format", > > > + O_RDONLY, FBT_GROUP_DATA, prp->desc->fun); > > > + if (fd < 0) > > > return -ENOENT; > > > > > > - snprintf(fn, len, "%s" FBT_GROUP_FMT "/%s/format", EVENTSFS, > > > - FBT_GROUP_DATA, prp->desc->fun); > > > - > > > - /* open format file */ > > > - f = fopen(fn, "r"); > > > - dt_free(dtp, fn); > > > + f = fdopen(fd, "r"); > > > if (f == NULL) > > > return -ENOENT; > > > > > > @@ -431,7 +428,7 @@ static void kprobe_detach(dtrace_hdl_t *dtp, const dt_probe_t *prp) > > > > > > dt_tp_probe_detach(dtp, prp); > > > > > > - fd = open(KPROBE_EVENTS, O_WRONLY | O_APPEND); > > > + fd = dt_tracefs_open(dtp, KPROBE_EVENTS, O_WRONLY | O_APPEND); > > > if (fd == -1) > > > return; > > > > > > diff --git a/libdtrace/dt_prov_rawtp.c b/libdtrace/dt_prov_rawtp.c > > > index 778a6f9cde90..6940edce6a6d 100644 > > > --- a/libdtrace/dt_prov_rawtp.c > > > +++ b/libdtrace/dt_prov_rawtp.c > > > @@ -38,7 +38,7 @@ > > > static const char prvname[] = "rawtp"; > > > static const char modname[] = "vmlinux"; > > > > > > -#define PROBE_LIST TRACEFS "available_events" > > > +#define PROBE_LIST "available_events" > > > > > > #define KPROBES "kprobes" > > > #define SYSCALLS "syscalls" > > > @@ -64,6 +64,7 @@ static const dtrace_pattr_t pattr = { > > > static int populate(dtrace_hdl_t *dtp) > > > { > > > dt_provider_t *prv; > > > + int fd; > > > FILE *f; > > > char *buf = NULL; > > > char *p; > > > @@ -73,7 +74,11 @@ static int populate(dtrace_hdl_t *dtp) > > > if (prv == NULL) > > > return -1; /* errno already set */ > > > > > > - f = fopen(PROBE_LIST, "r"); > > > + fd = dt_tracefs_open(dtp, PROBE_LIST, O_RDONLY); > > > + if (fd < 0) > > > + return 0; > > > + > > > + f = fdopen(fd, "r"); > > > if (f == NULL) > > > return 0; > > > > > > diff --git a/libdtrace/dt_prov_sdt.c b/libdtrace/dt_prov_sdt.c > > > index 675e0458ca4c..7ebe010efa79 100644 > > > --- a/libdtrace/dt_prov_sdt.c > > > +++ b/libdtrace/dt_prov_sdt.c > > > @@ -36,7 +36,7 @@ > > > static const char prvname[] = "sdt"; > > > static const char modname[] = "vmlinux"; > > > > > > -#define PROBE_LIST TRACEFS "available_events" > > > +#define PROBE_LIST "available_events" > > > > > > #define KPROBES "kprobes" > > > #define SYSCALLS "syscalls" > > > @@ -62,6 +62,7 @@ static const dtrace_pattr_t pattr = { > > > static int populate(dtrace_hdl_t *dtp) > > > { > > > dt_provider_t *prv; > > > + int fd; > > > FILE *f; > > > char *buf = NULL; > > > char *p; > > > @@ -71,7 +72,11 @@ static int populate(dtrace_hdl_t *dtp) > > > if (prv == NULL) > > > return -1; /* errno already set */ > > > > > > - f = fopen(PROBE_LIST, "r"); > > > + fd = dt_tracefs_open(dtp, PROBE_LIST, O_RDONLY); > > > + if (fd < 0) > > > + return 0; > > > + > > > + f = fdopen(fd, "r"); > > > if (f == NULL) > > > return 0; > > > > > > @@ -192,16 +197,16 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > > > static int probe_info_tracefs(dtrace_hdl_t *dtp, const dt_probe_t *prp, > > > int *argcp, dt_argdesc_t **argvp) > > > { > > > + int fd; > > > FILE *f; > > > - char *fn; > > > int rc; > > > const dtrace_probedesc_t *pdp = prp->desc; > > > > > > - if (asprintf(&fn, EVENTSFS "%s/%s/format", pdp->mod, pdp->prb) == -1) > > > - return dt_set_errno(dtp, EDT_NOMEM); > > > + fd = dt_tracefs_open(dtp, "events/%s/%s/format", O_RDONLY, pdp->mod, pdp->prb); > > > + if (fd < 0) > > > + return -ENOENT; > > > > > > - f = fopen(fn, "r"); > > > - free(fn); > > > + f = fdopen(fd, "r"); > > > if (!f) > > > return -ENOENT; > > > > > > @@ -223,15 +228,18 @@ static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp, > > > int argc = 0; > > > dt_argdesc_t *argv = NULL; > > > dtrace_typeinfo_t sym; > > > + int fd; > > > FILE *f; > > > uint32_t id; > > > > > > /* Retrieve the event id. */ > > > - if (asprintf(&str, EVENTSFS "%s/%s/id", prp->desc->mod, prp->desc->prb) == -1) > > > - return dt_set_errno(dtp, EDT_NOMEM); > > > > > > - f = fopen(str, "r"); > > > - free(str); > > > + fd = dt_tracefs_open(dtp, "events/%s/%s/id", O_RDONLY, > > > + prp->desc->mod, prp->desc->prb); > > > + if (fd < 0) > > > + return dt_set_errno(dtp, EDT_ENABLING_ERR); > > > + > > > + f = fdopen(fd, "r"); > > > if (!f) > > > return dt_set_errno(dtp, EDT_ENABLING_ERR); > > > > > > diff --git a/libdtrace/dt_prov_syscall.c b/libdtrace/dt_prov_syscall.c > > > index 20843c6f538e..63ce3bc43ae1 100644 > > > --- a/libdtrace/dt_prov_syscall.c > > > +++ b/libdtrace/dt_prov_syscall.c > > > @@ -38,7 +38,7 @@ > > > static const char prvname[] = "syscall"; > > > static const char modname[] = "vmlinux"; > > > > > > -#define SYSCALLSFS EVENTSFS "syscalls/" > > > +#define SYSCALLSFS "events/syscalls/" > > > > > > /* > > > * We need to skip over an extra field: __syscall_nr. > > > @@ -61,7 +61,7 @@ struct syscall_data { > > > > > > #define SCD_ARG(n) offsetof(struct syscall_data, arg[n]) > > > > > > -#define PROBE_LIST TRACEFS "available_events" > > > +#define PROBE_LIST "available_events" > > > > > > #define PROV_PREFIX "syscalls:" > > > #define ENTRY_PREFIX "sys_enter_" > > > @@ -71,6 +71,7 @@ struct syscall_data { > > > static int populate(dtrace_hdl_t *dtp) > > > { > > > dt_provider_t *prv; > > > + int fd; > > > FILE *f; > > > char *buf = NULL; > > > size_t n; > > > @@ -79,7 +80,11 @@ static int populate(dtrace_hdl_t *dtp) > > > if (prv == NULL) > > > return -1; /* errno already set */ > > > > > > - f = fopen(PROBE_LIST, "r"); > > > + fd = dt_tracefs_open(dtp, PROBE_LIST, O_RDONLY); > > > + if (fd < 0) > > > + return 0; > > > + > > > + f = fdopen(fd, "r"); > > > if (f == NULL) > > > return 0; > > > > > > @@ -195,23 +200,21 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl) > > > static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp, > > > int *argcp, dt_argdesc_t **argvp) > > > { > > > + int fd; > > > FILE *f; > > > - char fn[256]; > > > int rc; > > > > > > /* > > > * We know that the probe name is either "entry" or "return", so we can > > > * just check the first character. > > > */ > > > - strcpy(fn, SYSCALLSFS); > > > - if (prp->desc->prb[0] == 'e') > > > - strcat(fn, "sys_enter_"); > > > - else > > > - strcat(fn, "sys_exit_"); > > > - strcat(fn, prp->desc->fun); > > > - strcat(fn, "/format"); > > > + fd = dt_tracefs_open(dtp, SYSCALLSFS "/sys_%s_%s/format", O_RDONLY, > > > + (prp->desc->prb[0] == 'e') ? "enter" : "exit", > > > + prp->desc->fun); > > > + if (fd < 0) > > > + return -ENOENT; > > > > > > - f = fopen(fn, "r"); > > > + f = fdopen(fd, "r"); > > > if (!f) > > > return -ENOENT; > > > > > > diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c > > > index 205014617586..6a02243ff572 100644 > > > --- a/libdtrace/dt_prov_uprobe.c > > > +++ b/libdtrace/dt_prov_uprobe.c > > > @@ -1116,13 +1116,13 @@ static char *uprobe_name(dev_t dev, ino_t ino, uint64_t addr, int flags) > > > * uprobe may be a uretprobe. Return the probe's name as > > > * a new dynamically-allocated string, or NULL on error. > > > */ > > > -static char *uprobe_create(dev_t dev, ino_t ino, const char *mapping_fn, > > > - uint64_t addr, int flags) > > > +static char *uprobe_create(dtrace_hdl_t *dtp, dev_t dev, ino_t ino, > > > + const char *mapping_fn, uint64_t addr, int flags) > > > { > > > - int fd = -1; > > > - int rc = -1; > > > - char *name; > > > - char *spec; > > > + int fd = -1; > > > + int rc = -1; > > > + char *name; > > > + char *spec; > > > > > > if (asprintf(&spec, "%s:0x%lx", mapping_fn, addr) < 0) > > > return NULL; > > > @@ -1132,8 +1132,8 @@ static char *uprobe_create(dev_t dev, ino_t ino, const char *mapping_fn, > > > goto out; > > > > > > /* Add the uprobe. */ > > > - fd = open(TRACEFS "uprobe_events", O_WRONLY | O_APPEND); > > > - if (fd == -1) > > > + fd = dt_tracefs_open(dtp, "uprobe_events", O_WRONLY | O_APPEND); > > > + if (fd < 0) > > > goto out; > > > > > > rc = dprintf(fd, "%c:%s %s\n", flags & PP_IS_RETURN ? 'r' : 'p', name, spec); > > > @@ -1153,8 +1153,8 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *uprp, int bpf_fd) > > > { > > > dt_uprobe_t *upp = uprp->prv_data; > > > tp_probe_t *tpp = upp->tp; > > > + int fd; > > > FILE *f; > > > - char *fn; > > > char *prb = NULL; > > > int rc = -1; > > > > > > @@ -1163,7 +1163,7 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *uprp, int bpf_fd) > > > > > > assert(upp->fn != NULL); > > > > > > - prb = uprobe_create(upp->dev, upp->inum, upp->fn, upp->off, > > > + prb = uprobe_create(dtp, upp->dev, upp->inum, upp->fn, upp->off, > > > upp->flags); > > > > > > /* > > > @@ -1177,12 +1177,12 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *uprp, int bpf_fd) > > > upp->flags); > > > > > > /* open format file */ > > > - rc = asprintf(&fn, "%s%s/format", EVENTSFS, prb); > > > + fd = dt_tracefs_open(dtp, "events/%s/format", O_RDONLY, prb); > > > free(prb); > > > - if (rc < 0) > > > + if (fd < 0) > > > return -ENOENT; > > > - f = fopen(fn, "r"); > > > - free(fn); > > > + > > > + f = fdopen(fd, "r"); > > > if (f == NULL) > > > return -ENOENT; > > > > > > @@ -1251,21 +1251,20 @@ done: > > > * Destroy a uprobe for a given device and address. > > > */ > > > static int > > > -uprobe_delete(dev_t dev, ino_t ino, uint64_t addr, int flags) > > > +uprobe_delete(dtrace_hdl_t *dtp, dev_t dev, ino_t ino, uint64_t addr, int flags) > > > { > > > - int fd = -1; > > > - int rc = -1; > > > - char *name; > > > + int fd = -1; > > > + int rc = -1; > > > + char *name; > > > > > > name = uprobe_name(dev, ino, addr, flags); > > > if (!name) > > > goto out; > > > > > > - fd = open(TRACEFS "uprobe_events", O_WRONLY | O_APPEND); > > > + fd = dt_tracefs_open(dtp, "uprobe_events", O_WRONLY | O_APPEND); > > > if (fd == -1) > > > goto out; > > > > > > - > > > rc = dprintf(fd, "-:%s\n", name); > > > > > > out: > > > @@ -1297,7 +1296,7 @@ static void detach(dtrace_hdl_t *dtp, const dt_probe_t *uprp) > > > > > > dt_tp_detach(dtp, tpp); > > > > > > - uprobe_delete(upp->dev, upp->inum, upp->off, upp->flags); > > > + uprobe_delete(dtp, upp->dev, upp->inum, upp->off, upp->flags); > > > } > > > > > > /* > > > diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h > > > index 8f143dceaed7..4598a380b950 100644 > > > --- a/libdtrace/dt_provider.h > > > +++ b/libdtrace/dt_provider.h > > > @@ -12,7 +12,6 @@ > > > #include > > > #include > > > #include > > > -#include > > > > > > #ifdef __cplusplus > > > extern "C" { > > > diff --git a/libdtrace/dt_subr.c b/libdtrace/dt_subr.c > > > index d5dca164861e..f129e5591465 100644 > > > --- a/libdtrace/dt_subr.c > > > +++ b/libdtrace/dt_subr.c > > > @@ -20,6 +20,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > #include > > > > > > #include > > > @@ -998,3 +999,82 @@ uint32_t dt_gen_hval(const char *p, uint32_t hval, size_t len) > > > > > > return hval; > > > } > > > + > > > +/* > > > + * Find the tracefs and store it away in dtp. > > > + */ > > > +static int > > > +find_tracefs_path(dtrace_hdl_t *dtp) > > > +{ > > > + FILE *mounts; > > > + struct mntent *mnt; > > > + > > > + if ((mounts = setmntent("/proc/mounts", "r")) == NULL) { > > > + dt_dprintf("Cannot open /proc/mounts: %s\n", strerror(errno)); > > > + return dt_set_errno(dtp, EDT_TRACEFS); > > > + } > > > + > > > + while ((mnt = getmntent(mounts)) != NULL) { > > > + /* > > > + * Only accept tracefs paths that do not contain percent > > > + * characters in their mounted paths, since we use this > > > + * to augment a format string in dt_tracefs_vfn(). > > > + */ > > > + if ((strcmp(mnt->mnt_type, "tracefs") == 0) && > > > + (strchr(mnt->mnt_dir, '%') == NULL)) { > > > + dtp->dt_tracefs_path = strdup(mnt->mnt_dir); > > > + break; > > > + } > > > + } > > > + endmntent(mounts); > > > + > > > + if (!dtp->dt_tracefs_path) { > > > + dt_dprintf("Cannot find a suitable tracefs path.\n"); > > > + return dt_set_errno(dtp, EDT_TRACEFS); > > > + } > > > + > > > + dt_dprintf("Found tracefs at %s\n", dtp->dt_tracefs_path); > > > + > > > + return 0; > > > +} > > > + > > > +static char * > > > +dt_tracefs_vfn(dtrace_hdl_t *dtp, const char *fn, va_list ap) > > > +{ > > > + char *full_fn; > > > + char *str; > > > + > > > + if (!dtp->dt_tracefs_path) > > > + if (find_tracefs_path(dtp) < 0) > > > + return NULL; /* errno is set for us. */ > > > + > > > + if (asprintf(&full_fn, "%s/%s", dtp->dt_tracefs_path, fn) < 0) { > > > + dt_set_errno(dtp, EDT_NOMEM); > > > + return NULL; > > > + } > > > + > > > + if (vasprintf(&str, full_fn, ap) < 0) { > > > + str = NULL; > > > + dt_set_errno(dtp, EDT_NOMEM); > > > + } > > > + free(full_fn); > > > + return str; > > > +} > > > + > > > +int > > > +dt_tracefs_open(dtrace_hdl_t *dtp, const char *fn, int flags, ...) > > > +{ > > > + va_list ap; > > > + char *str; > > > + int fd; > > > + > > > + va_start(ap, flags); > > > + if ((str = dt_tracefs_vfn(dtp, fn, ap)) == NULL) { > > > + va_end(ap); > > > + return -1; /* errno is set for us. */ > > > + } > > > + > > > + fd = open(str, flags, 0666); > > > + free(str); > > > + return fd; /* errno is set for us. */ > > > +} > > > diff --git a/runtest.sh b/runtest.sh > > > index 46b532d7e161..fbf4e60c82a9 100755 > > > --- a/runtest.sh > > > +++ b/runtest.sh > > > @@ -607,6 +607,14 @@ elif ! /usr/bin/cpp -x c -fno-show-column - /dev/null < /dev/null 2>&1 | \ > > > export DTRACE_OPT_CPPARGS="-fno-show-column" > > > fi > > > > > > +# Find the tracefs. > > > +tracefs="$(awk '$3 == "tracefs" { print $2; exit; }' /proc/mounts)" > > > +if [[ -z $tracefs ]]; then > > > + echo "Cannot find any tracefs mounts in /proc/mounts. Some tests will fail." >&2 > > > +fi > > > + > > > +export tracefs > > > + > > > # More than one dtrace tree -> run tests for all dtraces, and verify identical > > > # intermediate code is produced by each dtrace. > > > > > > diff --git a/test/unittest/funcs/tst.rw_.x b/test/unittest/funcs/tst.rw_.x > > > index 29c581116154..5737c7575a26 100755 > > > --- a/test/unittest/funcs/tst.rw_.x > > > +++ b/test/unittest/funcs/tst.rw_.x > > > @@ -1,6 +1,11 @@ > > > #!/bin/sh > > > > > > -FUNCS=/sys/kernel/debug/tracing/available_filter_functions > > > +FUNCS=${tracefs}/available_filter_functions > > > + > > > +if [[ ! -e $FUNCS ]]; then > > > + echo no tracefs found > > > + exit 1 > > > +fi > > > > > > if ! grep -qw _raw_read_lock $FUNCS; then > > > echo no _raw_read_lock FBT probe due to kernel config > > > diff --git a/test/unittest/providers/tst.dtrace_cleanup.sh b/test/unittest/providers/tst.dtrace_cleanup.sh > > > index 4ac59ccb4315..f3e434ae76fc 100755 > > > --- a/test/unittest/providers/tst.dtrace_cleanup.sh > > > +++ b/test/unittest/providers/tst.dtrace_cleanup.sh > > > @@ -1,7 +1,7 @@ > > > #!/bin/bash > > > # > > > # Oracle Linux DTrace. > > > -# Copyright (c) 2020, 2022, Oracle and/or its affiliates. All rights reserved. > > > +# Copyright (c) 2020, 2024, Oracle and/or its affiliates. All rights reserved. > > > # Licensed under the Universal Permissive License v 1.0 as shown at > > > # http://oss.oracle.com/licenses/upl. > > > > > > @@ -14,7 +14,12 @@ > > > ## > > > > > > dtrace=$1 > > > -UPROBE_EVENTS=/sys/kernel/debug/tracing/uprobe_events > > > +UPROBE_EVENTS=${tracefs}/uprobe_events > > > + > > > +if [[ ! -e $UPROBE_EVENTS ]]; then > > > + echo "no tracefs/uprobe_events" >&2 > > > + exit 67 > > > +fi > > > > > > out=/tmp/output.$$ > > > $dtrace $dt_flags -n BEGIN,END &>> $out & > > > diff --git a/test/utils/clean_probes.sh b/test/utils/clean_probes.sh > > > index 8292b3096424..cfd100088eb7 100755 > > > --- a/test/utils/clean_probes.sh > > > +++ b/test/utils/clean_probes.sh > > > @@ -1,9 +1,13 @@ > > > #!/usr/bin/bash > > > > > > -TRACEFS=/sys/kernel/debug/tracing > > > -EVENTS=${TRACEFS}/available_events > > > -KPROBES=${TRACEFS}/kprobe_events > > > -UPROBES=${TRACEFS}/uprobe_events > > > +EVENTS=${tracefs}/available_events > > > +KPROBES=${tracefs}/kprobe_events > > > +UPROBES=${tracefs}/uprobe_events > > > + > > > +# We can't work without the tracefs: just give up quietly. > > > +if [[ ! -e $EVENTS ]]; then > > > + exit 0 > > > +fi > > > > > > # Check permissions > > > if [[ ! -r ${EVENTS} ]]; then > > > -- > > > 2.46.0.278.g36e3a12567 > > >