[DTrace-devel] [PATCH 1/3] Cache per-CPU agg map IDs
Eugene Loh
eugene.loh at oracle.com
Wed Jul 16 20:05:27 UTC 2025
On 7/16/25 15:10, Kris Van Hees wrote:
> On Thu, May 01, 2025 at 02:22:50PM -0400, eugene.loh--- via DTrace-devel wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> The dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu) call that is used to
>> snap or truncate aggregations takes a few milliseconds, which seems all
>> right. For large systems (e.g., 100 CPUs) and many truncations (e.g.,
>> tst.trunc.d, etc.), however, a trunc() might end up costing a minute on
>> the consumer side, which is unreasonable and causes such tests to time
>> out. The run time is due almost exclusively to looking up the per-CPU
>> agg map ID.
> Of course, we only go from 2 bpf() syscalls per call to 1 bpf() syscall per
> call. Have you collected timings for the two kinds of bpf calls we use?
I'm rusty on this, but as I remember I collected timings on the BPF
calls and it was clear who the culprit was.
> I would actually expect that the larger cost lies with the
> bpf_map_get_fd_by_id() one rather than the bpf_map_lookup() since the
> fd_by_id needs to allocate an fd and link it with a map.
>
> In all, how bad would it be if we were to just keep the fds open for the
> per-CPU aggregation data, and not having to do these bpf syscalls at all.
>
> But we could end up with 100s of fds open of course...
Yeah, I think I decided against that approach for that reason (iirc).
I'm happy with Nick's R-b, but let me know if you want more
investigation on this patch.
>> Cache the per-CPU agg map IDs.
>>
>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>> ---
>> libdtrace/dt_aggregate.c | 5 ++---
>> libdtrace/dt_bpf.c | 6 ++++++
>> libdtrace/dt_impl.h | 1 +
>> libdtrace/dt_open.c | 1 +
>> 4 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
>> index 9e47fcab7..86f9d4d5b 100644
>> --- a/libdtrace/dt_aggregate.c
>> +++ b/libdtrace/dt_aggregate.c
>> @@ -800,7 +800,7 @@ dtrace_aggregate_snap(dtrace_hdl_t *dtp)
>>
>> for (i = 0; i < dtp->dt_conf.num_online_cpus; i++) {
>> int cpu = dtp->dt_conf.cpus[i].cpu_id;
>> - int fd = dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu);
>> + int fd = dt_bpf_map_get_fd_by_id(dtp->dt_aggmap_ids[i]);
>>
>> if (fd < 0)
>> return DTRACE_WORKSTATUS_ERROR;
>> @@ -1232,8 +1232,7 @@ dt_aggwalk_remove(dtrace_hdl_t *dtp, dt_ahashent_t *h)
>> memcpy(key, agd->dtada_key, agd->dtada_desc->dtagd_ksize);
>>
>> for (i = 0; i < ncpus; i++) {
>> - int cpu = dtp->dt_conf.cpus[i].cpu_id;
>> - int fd = dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu);
>> + int fd = dt_bpf_map_get_fd_by_id(dtp->dt_aggmap_ids[i]);
>>
>> if (fd < 0)
>> return DTRACE_WORKSTATUS_ERROR;
>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> index d6722cbd1..635780738 100644
>> --- a/libdtrace/dt_bpf.c
>> +++ b/libdtrace/dt_bpf.c
>> @@ -689,6 +689,10 @@ gmap_create_aggs(dtrace_hdl_t *dtp)
>> if (dtp->dt_aggmap_fd == -1)
>> return -1;
>>
>> + dtp->dt_aggmap_ids = dt_calloc(dtp, dtp->dt_conf.num_online_cpus, sizeof(int));
>> + if (dtp->dt_aggmap_ids == NULL)
>> + return dt_set_errno(dtp, EDT_NOMEM);
>> +
>> for (i = 0; i < dtp->dt_conf.num_online_cpus; i++) {
>> int cpu = dtp->dt_conf.cpus[i].cpu_id;
>> char name[16];
>> @@ -702,6 +706,8 @@ gmap_create_aggs(dtrace_hdl_t *dtp)
>> return map_create_error(dtp, name, errno);
>>
>> dt_bpf_map_update(dtp->dt_aggmap_fd, &cpu, &fd);
>> + if (dt_bpf_map_lookup(dtp->dt_aggmap_fd, &cpu, &dtp->dt_aggmap_ids[i]) < 0)
>> + return -1;
>> }
>>
>> /* Create the agg generation value array. */
>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>> index 68fb8ec53..1033154d9 100644
>> --- a/libdtrace/dt_impl.h
>> +++ b/libdtrace/dt_impl.h
>> @@ -388,6 +388,7 @@ struct dtrace_hdl {
>> int dt_proc_fd; /* file descriptor for proc eventfd */
>> int dt_stmap_fd; /* file descriptor for the 'state' BPF map */
>> int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
>> + int *dt_aggmap_ids; /* ids for the 'aggN' BPF maps */
>> int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
>> int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
>> int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
>> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
>> index 71ee21467..7da4c82cc 100644
>> --- a/libdtrace/dt_open.c
>> +++ b/libdtrace/dt_open.c
>> @@ -1233,6 +1233,7 @@ dtrace_close(dtrace_hdl_t *dtp)
>> dt_probe_detach_all(dtp);
>>
>> dt_free(dtp, dtp->dt_conf.cpus);
>> + dt_free(dtp, dtp->dt_aggmap_ids);
>>
>> if (dtp->dt_procs != NULL)
>> dt_proc_hash_destroy(dtp);
>> --
>> 2.43.5
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list