[DTrace-devel] [PATCH 5/5 v2] Add a profile provider

Tue Jun 16 21:54:43 PDT 2020

On Tue, Jun 16, 2020 at 12:58:59PM -0700, Eugene Loh wrote:
> On 06/15/2020 09:48 PM, Kris Van Hees wrote:
> 
> > On Mon, Jun 15, 2020 at 02:18:05PM -0400, eugene.loh at oracle.com wrote:
> >> From: Eugene Loh <eugene.loh at oracle.com>
> >>
> >> The probes use perf_events with PERF_TYPE_SOFTWARE and
> >> PERF_COUNT_SW_CPU_CLOCK.
> >>
> >> There are many FIXMEs.  Among them, the trampoline does not
> >> yet write the entire DTrace context... at least not the DTrace
> >> probe arguments.  Also, profile-n is not yet implemented.
> >>
> >> For other providers, the work of setting a probe's event_fd is
> >> done in bpf_attach(), but the profile provider is sufficiently
> >> different that it makes more sense to do so in the provider's
> >> probe_info() function.  That is, we would like to set the
> >> probe's event_id and event_fd in probe_info().  On the other
> > I don't believe this makes sense.  The point is that a probe is not created
> > (or enabled) until we are going to use it.
> 
> I wanted to get clarification on what you mean.  Currently, we "create" 
> probes -- like dtrace's uprobes and fbt's kprobes -- during the 
> providers' probe_info() functions.  This occurs during, e.g., 
> cmd/dtrace.c's "fourth phase" of processing command-line options -- that 
> is, when D is being compiled and before we do any "list" or "exec" stuff 
> (i.e., whether or not we're going to use the probes). I assume such 
> behavior is inconsistent with the point you're making?

Ok, I expressed myself in a somewhat confusing way due to the overloaded
concept of what a probe is.  I am refering to the creation of the actual
underlying perf event as probe creation.  I would e.g. argue that although
the kprobe does not exist yet for an FBT probe until probe_info() is called,
the probe actually does exist since the symbol is listed in the list of
functions that can be traced using kprobes.  So for the case of FBT probes we
have:

  1. A function is listed in available_filter_functions and we can therefore
     say that it exists as a probe (we actually create it as a probe when we
     call populate() in the FBT provider implementation.
  2. A kprobe is created when probe_info() is called because we need to be
     able to retrieve probe specific information.  This is actually aimed at
     determining argument information.  The event_id is retrieved at this point
     because we are parsing the format file for the kprobe anyway.  We still
     do not use it for anything really.
  3. A perf event is created for the kprobe so that we can associate a BPF
     program with it.  This is where the underlying event is actually created
     as something we can attach to, and this is therefore the point at which
     we "enable" the probe.

Note that for kprobes and uprobes we could actually use a variant of the
perf_event_open() system call where the kprobe/uprobe is instantiated at the
time of creating the perf event.  It gains us auto-cleanup but it means that
we do not have access to the information probe_info() needs until point 3
above which is actually too late.

So, the event_fd wouldn't be something that we can fill in at the time of
probe_info() because the perf event isn't going to exist yet.  That should
still be perfectly fine for profile-n and tick-n probes because you do not
need the perf event yet at this point.  As long as we have the dt_probe_t for
the probe we ought to be OK.

> > During the populate and provide
> > processing all that really ought to be done is create the probes with their
> > relevant (provider specific) data.  Then when we are ready to actually start
> > enabling probes and loading programs we ought to make the perf_event_open
> > calls to instantiate the kernel level events/probes.
> >
> > Since the profile provider is quite different from the other existing providers
> > (and pid provider will be similarly different) it makes sense to introduce
> > another callback function for the provider API:
> >
> > 	int enable(dtrace_hdl_t *dtp, const dt_probe_t *prp)
> >
> > This function is called for a specifc probe in the provider, and is expected
> > to return a file descriptor (typically return value of perf_event_open()) or
> > -1 upon error.  The caller should assign this fd (return value) to the event_fd
> > field in the dt_probe_t struct.
> 
> I can look at such a function, but I have some questions:
> 
> *)  Would this be to replace the event_fd initialization in dt_bpf_attach()?

> *)  In fact, wouldn't it make sense to eliminate dt_bpf_attach() 
> completely, and just have dt_bpf_stmt() call some 
> prp->prov->impl->enable(dtp, prp, fd) to attach the BPF program (fd) to 
> the D "probe"?  I bring this up due to the side discussion we were 
> having about how support for "profile-n" probes (which are implemented 
> with many, per-CPU perf_events) need multiple event_fd's per D probe.  
> Specifically, the profile-provider-specific enable (or attach) function 
> would not return an event_fd.

Indeed, we need to replace dt_bpf_attach() altogether - so instead of an
enable() function we probably really should simply add an attach() function
with prototype:

	int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)

returning 0 on success and -1 if it fails (and set errno correctly etc).  Funny
thing is that I used to have a provider-specific attach function in the version
I prepared for inclusion in the kernel tree (a long time ago).

> > This was the work of actually creating and enabling the underlying kernel perf
> > event can be handled by each provider based on its unique requirements.
> >
> >> hand, the function was declared to take a "const" probe.
> >> This didn't even make sense, however, since the function's
> >> only caller, dt_probe_args_info(), retrieved information
> >> specifically so that it could set fields like event_id.  So,
> >> this commit removes the "const" from the probe in probe_info()'s
> >> declaration.
> > Why didn't it make sense that probe_info() specifies the 'const' modifier on
> > the prp argument?  The probe_info() function does not modify any fields in
> > prp (dt_probe_t), nor is it expected to.  So 'const' is sensible here.
> 
> The only call chain that calls the providers' probe_info() functions is
> dt_probe_info()
> -> dt_probe_args_info()
> -> prp->prov->impl->probe_info()
> Not only does this call chain modify the "dt_probe_t *prp", it actually 
> creates it from scratch.  So while it was the case that probe_info() 
> wasn't itself modifying prp, it was only being used to construct prp.  

No, no.  You mentioned that 'const' didn't even make sense here but you are
refering to how you changed what probe_info() is supposed to do as a reason.
That doesn't hold.  If you look at the code prior to your patch, it should be
quite clear that 'const' is warranted here because prp is not modified at all.

> It did not seem important to retain that const.  In particular, if the 
> profile-provider probe_info() function were used to create the probe -- 
> which is what the dtrace and fbt providers were already doing anyhow -- 
> then not much other code had to be touched.
> 
> But I think I can do your "enable()" suggestion, in which case this 
> const business does not need to be addressed.

Indeed, which is why I flagged that argument because the 'const' modifier for
prp as argument to probe_info() is valid.

In summary, what the profile provider ought to do is:

1. Populate the probe list with the standard profile-n and tick-n probes.
2. Create a new probe (at the DTrace level) using provide() to allow dynamic
   creation of probes.  No actual perf event is created yet - this should just
   create a probe using dt_probe_insert() and set the needed data in the
   provider specific struct for the probe.
3. Provide sensible data when probe_info() is called.
4. Create the actual perf event(s) when attach() is called.  Attach the given
   BPF program (ref'd by fd) to the newly created perf event(s).