[DTrace-devel] [PATCH] Review perf_event_open() calls

Eugene Loh eugene.loh at oracle.com
Thu Oct 22 16:45:24 PDT 2020


----- kris.van.hees at oracle.com wrote:

> On Wed, Oct 14, 2020 at 05:35:45PM -0400, eugene.loh at oracle.com
> wrote:
> > From: Eugene Loh <eugene.loh at oracle.com>
> > 
> > Our perf_event_open() calls should be cleaned up a little.  The issues
> > are minor, with no user-observable effects, but cleaner code is better.
> > 
> > Set size in "struct perf_event_attr".
> 
> Sure, though see my suggestions below.
> 
> > Use flag PERF_FLAG_FD_CLOEXEC.
> 
> This is actually not necessary.  I am not opposed to it but the code as it
> was didn't pose an issue because we're not using execve().

Okay.  I'd like to understand this better.

There are cases where we call execvp() but before perf_event_open() is called.
E.g., to preprocess D scripts during compilation or to launch -c processes.
Because they precede perf_event_open(), they don't matter.

There are some cases where we call execlp() during the consumer, like someday
for pcap processing.

But the main thing I was wondering about was consistency.  That is, it either
makes sense for us to use CLOEXEC or not, and we might as well employ some
consistent policy.

> > Use PERF_SAMPLE_RAW only for the perf_event buffer to which BPF programs
> > will write.
> 
> That is actually harmless, but yes, we can drop it.
> 
> > Distribute perf_events evenly among CPUs.  (Note that perf_event buffers
> > and profile-n events are one-per-CPU.  Further, tick-n events are
> > distributed randomly for load balancing.  The only change needed is to
> > distribute tp_attach() randomly over CPUs rather than mapping them all
> > to CPU 0.)
> 
> Can you explain your rationale here, i.e. point out where there is an issue?
> As you mention, perf_event ring buffers (our output buffers) are per-CPU, and
> profile-n events are (as required) per-CPU.  The tick-n events being spread
> across all CPUs makes sense also because they are events that fire as timer
> events (and are out of necessity tied to a specific CPU).  I am not quite sure
> where speading SDT events over the various CPUs accomplishes load balancing
> because those events are triggered by executing code and therefore execute on
> the same CPU as where the executable code was running.  So I am unclear on
> what exactly you are load balancing here?

Mostly, I just did not understand, and I guess I still don't know how
tick-n and SDT are different.  The perf_event_open() man page says cpu>=0
means monitoring only on the specified CPU.  That does not seem to be the
case and it would not make sense if we are running all SDTs on cpu=0.

Maybe I'll just rephrase as a question:  what does perf_event_open()'s cpu
argument mean, at least in the case of our SDT usage?

> > Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> > ---
> >  libdtrace/dt_peb.c          | 1 +
> >  libdtrace/dt_prov_profile.c | 3 +--
> >  libdtrace/dt_prov_sdt.c     | 8 ++++++--
> >  3 files changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libdtrace/dt_peb.c b/libdtrace/dt_peb.c
> > index 0ef1c419..b8c49e2f 100644
> > --- a/libdtrace/dt_peb.c
> > +++ b/libdtrace/dt_peb.c
> > @@ -75,6 +75,7 @@ dt_peb_open(dt_peb_t *peb)
> >  	memset(&attr, 0, sizeof(attr));
> >  	attr.config = PERF_COUNT_SW_BPF_OUTPUT;
> >  	attr.type = PERF_TYPE_SOFTWARE;
> > +	attr.size = sizeof(struct perf_event_attr);
> 
> attr.size = sizeof(attr);
> 
> >  	attr.sample_type = PERF_SAMPLE_RAW;
> >  	attr.sample_period = 1;
> >  	attr.wakeup_events = 1;
> > diff --git a/libdtrace/dt_prov_profile.c b/libdtrace/dt_prov_profile.c
> > index 8541bee6..22cd968a 100644
> > --- a/libdtrace/dt_prov_profile.c
> > +++ b/libdtrace/dt_prov_profile.c
> > @@ -293,7 +293,6 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> >  	memset(&attr, 0, sizeof(attr));
> >  	attr.type = PERF_TYPE_SOFTWARE;
> >  	attr.config = PERF_COUNT_SW_CPU_CLOCK;
> > -	attr.sample_type = PERF_SAMPLE_RAW;
> >  	attr.size = sizeof(struct perf_event_attr);
> >  	attr.wakeup_events = 1;
> >  	attr.freq = 0;
> > @@ -307,7 +306,7 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> >  			j = rand() % dtp->dt_conf.num_online_cpus;
> >  
> >  		fd = perf_event_open(&attr, -1, dtp->dt_conf.cpus[j].cpu_id,
> > -				     -1, 0);
> > +				     -1, PERF_FLAG_FD_CLOEXEC);
> >  		if (fd < 0)
> >  			continue;
> >  		if (ioctl(fd, PERF_EVENT_IOC_SET_BPF, bpf_fd) < 0) {
> > diff --git a/libdtrace/dt_prov_sdt.c b/libdtrace/dt_prov_sdt.c
> > index 4b2ac31b..8775103f 100644
> > --- a/libdtrace/dt_prov_sdt.c
> > +++ b/libdtrace/dt_prov_sdt.c
> > @@ -77,14 +77,18 @@ int tp_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> >  	if (datap->event_fd == -1) {
> >  		int			fd;
> >  		struct perf_event_attr	attr = { 0, };
> > +		int			cpu_index, cpu;
> > +
> > +		cpu_index = rand() % dtp->dt_conf.num_online_cpus;
> > +		cpu = dtp->dt_conf.cpus[cpu_index].cpu_id;
> >  
> >  		attr.type = PERF_TYPE_TRACEPOINT;
> > -		attr.sample_type = PERF_SAMPLE_RAW;
> > +		attr.size = sizeof(struct perf_event_attr);
> 
> attr.size = sizeof(attr);
> 
> >  		attr.sample_period = 1;
> >  		attr.wakeup_events = 1;
> >  		attr.config = datap->event_id;
> >  
> > -		fd = perf_event_open(&attr, -1, 0, -1, 0);
> > +		fd = perf_event_open(&attr, -1, cpu, -1, PERF_FLAG_FD_CLOEXEC);
> >  		if (fd < 0)
> >  			return dt_set_errno(dtp, errno);
> >  
> > -- 
> > 2.18.4
> > 
> > 
> > _______________________________________________
> > DTrace-devel mailing list
> > DTrace-devel at oss.oracle.com
> > https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list