[DTrace-devel] [PATCH] Review perf_event_open() calls
Eugene Loh
eugene.loh at oracle.com
Thu Oct 22 16:45:24 PDT 2020
----- kris.van.hees at oracle.com wrote:
> On Wed, Oct 14, 2020 at 05:35:45PM -0400, eugene.loh at oracle.com
> wrote:
> > From: Eugene Loh <eugene.loh at oracle.com>
> >
> > Our perf_event_open() calls should be cleaned up a little. The issues
> > are minor, with no user-observable effects, but cleaner code is better.
> >
> > Set size in "struct perf_event_attr".
>
> Sure, though see my suggestions below.
>
> > Use flag PERF_FLAG_FD_CLOEXEC.
>
> This is actually not necessary. I am not opposed to it but the code as it
> was didn't pose an issue because we're not using execve().
Okay. I'd like to understand this better.
There are cases where we call execvp() but before perf_event_open() is called.
E.g., to preprocess D scripts during compilation or to launch -c processes.
Because they precede perf_event_open(), they don't matter.
There are some cases where we call execlp() during the consumer, like someday
for pcap processing.
But the main thing I was wondering about was consistency. That is, it either
makes sense for us to use CLOEXEC or not, and we might as well employ some
consistent policy.
> > Use PERF_SAMPLE_RAW only for the perf_event buffer to which BPF programs
> > will write.
>
> That is actually harmless, but yes, we can drop it.
>
> > Distribute perf_events evenly among CPUs. (Note that perf_event buffers
> > and profile-n events are one-per-CPU. Further, tick-n events are
> > distributed randomly for load balancing. The only change needed is to
> > distribute tp_attach() randomly over CPUs rather than mapping them all
> > to CPU 0.)
>
> Can you explain your rationale here, i.e. point out where there is an issue?
> As you mention, perf_event ring buffers (our output buffers) are per-CPU, and
> profile-n events are (as required) per-CPU. The tick-n events being spread
> across all CPUs makes sense also because they are events that fire as timer
> events (and are out of necessity tied to a specific CPU). I am not quite sure
> where speading SDT events over the various CPUs accomplishes load balancing
> because those events are triggered by executing code and therefore execute on
> the same CPU as where the executable code was running. So I am unclear on
> what exactly you are load balancing here?
Mostly, I just did not understand, and I guess I still don't know how
tick-n and SDT are different. The perf_event_open() man page says cpu>=0
means monitoring only on the specified CPU. That does not seem to be the
case and it would not make sense if we are running all SDTs on cpu=0.
Maybe I'll just rephrase as a question: what does perf_event_open()'s cpu
argument mean, at least in the case of our SDT usage?
> > Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> > ---
> > libdtrace/dt_peb.c | 1 +
> > libdtrace/dt_prov_profile.c | 3 +--
> > libdtrace/dt_prov_sdt.c | 8 ++++++--
> > 3 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/libdtrace/dt_peb.c b/libdtrace/dt_peb.c
> > index 0ef1c419..b8c49e2f 100644
> > --- a/libdtrace/dt_peb.c
> > +++ b/libdtrace/dt_peb.c
> > @@ -75,6 +75,7 @@ dt_peb_open(dt_peb_t *peb)
> > memset(&attr, 0, sizeof(attr));
> > attr.config = PERF_COUNT_SW_BPF_OUTPUT;
> > attr.type = PERF_TYPE_SOFTWARE;
> > + attr.size = sizeof(struct perf_event_attr);
>
> attr.size = sizeof(attr);
>
> > attr.sample_type = PERF_SAMPLE_RAW;
> > attr.sample_period = 1;
> > attr.wakeup_events = 1;
> > diff --git a/libdtrace/dt_prov_profile.c b/libdtrace/dt_prov_profile.c
> > index 8541bee6..22cd968a 100644
> > --- a/libdtrace/dt_prov_profile.c
> > +++ b/libdtrace/dt_prov_profile.c
> > @@ -293,7 +293,6 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> > memset(&attr, 0, sizeof(attr));
> > attr.type = PERF_TYPE_SOFTWARE;
> > attr.config = PERF_COUNT_SW_CPU_CLOCK;
> > - attr.sample_type = PERF_SAMPLE_RAW;
> > attr.size = sizeof(struct perf_event_attr);
> > attr.wakeup_events = 1;
> > attr.freq = 0;
> > @@ -307,7 +306,7 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> > j = rand() % dtp->dt_conf.num_online_cpus;
> >
> > fd = perf_event_open(&attr, -1, dtp->dt_conf.cpus[j].cpu_id,
> > - -1, 0);
> > + -1, PERF_FLAG_FD_CLOEXEC);
> > if (fd < 0)
> > continue;
> > if (ioctl(fd, PERF_EVENT_IOC_SET_BPF, bpf_fd) < 0) {
> > diff --git a/libdtrace/dt_prov_sdt.c b/libdtrace/dt_prov_sdt.c
> > index 4b2ac31b..8775103f 100644
> > --- a/libdtrace/dt_prov_sdt.c
> > +++ b/libdtrace/dt_prov_sdt.c
> > @@ -77,14 +77,18 @@ int tp_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> > if (datap->event_fd == -1) {
> > int fd;
> > struct perf_event_attr attr = { 0, };
> > + int cpu_index, cpu;
> > +
> > + cpu_index = rand() % dtp->dt_conf.num_online_cpus;
> > + cpu = dtp->dt_conf.cpus[cpu_index].cpu_id;
> >
> > attr.type = PERF_TYPE_TRACEPOINT;
> > - attr.sample_type = PERF_SAMPLE_RAW;
> > + attr.size = sizeof(struct perf_event_attr);
>
> attr.size = sizeof(attr);
>
> > attr.sample_period = 1;
> > attr.wakeup_events = 1;
> > attr.config = datap->event_id;
> >
> > - fd = perf_event_open(&attr, -1, 0, -1, 0);
> > + fd = perf_event_open(&attr, -1, cpu, -1, PERF_FLAG_FD_CLOEXEC);
> > if (fd < 0)
> > return dt_set_errno(dtp, errno);
> >
> > --
> > 2.18.4
> >
> >
> > _______________________________________________
> > DTrace-devel mailing list
> > DTrace-devel at oss.oracle.com
> > https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list