[DTrace-devel] [PATCH 5/5 v2] Add a profile provider

Kris Van Hees kris.van.hees at oracle.com
Tue Jun 23 12:22:09 PDT 2020


On Mon, Jun 22, 2020 at 11:08:19PM -0700, Eugene Loh wrote:
> On 06/20/2020 07:36 PM, Kris Van Hees wrote:
> 
> > On Sat, Jun 20, 2020 at 09:26:16AM -0700, Eugene Loh wrote:
> >> Perhaps I could have written that last email more succinctly as follows.
> >>
> >> I think the provider-specific probe_info() and attach() functions should
> >> not have "const" on their "prp" arguments.  Here is why.
> >>
> >> We want DTrace to support a wide range of providers.  Hence, we supply a
> >> few provider-specific functions.
> >>
> >> To date, our providers have been "tracepoint-like".  Specifically,
> >> probe_info() has read /sys/kernel/debug/tracing/$group/$event/format to
> >> get an event_id and read arguments;  it has also created a kprobe or
> >> uprobe if necessary.  We have discussed an attach() function, which
> >> would take a prp->event_id, run perf_event_open(), and store a
> >> prp->event_fd.
> >>
> >> To support more general providers, we have discussed hiding
> >> provider-specific arguments such as event_id and event_fd behind a
> >> provider-specific, opaque pointer within prp.  So, I am suggesting that
> >> the prp argument to probe_info() and attach() should not be "const".
> >>
> >> 1)  If probe_info() for tracepoint-like providers will determine the
> >> event_id (possibly even creating a kprobe or a uprobe) and store it, but
> >> other providers do not even have an event_id at all, then the
> >> provider-specific probe_info() must access and update the prp opaque
> >> pointer.
> >>
> >> 2)  If attach() will "create probes" -- perhaps running
> >> perf_event_open() and attaching BPF programs to one or more file
> >> descriptors that we would like to associate with prp -- and the details
> >> are provider-specific, then the file descriptor(s) should also be stored
> >> using the provider-specific opaque prp pointer within the attach() function.
> > When provider-specific data (like event_id and event_fd) are stored in a
> > provider specific struct that is referenced through an opaque pointer in the
> > dt_probe_t structure, then const makes perfect sense for the functions in
> > the provider because those functions do not (and should not) modify the
> > data in the dt_probe_t structure.  All modifications are limited to the
> > provider specific struct that can validly be modified even when prp itself
> > is const.
> 
> I suppose so.  I guess I agree.  Somewhat.
> 
> The opaque pointer needs to be initialized.  It needs to point to some 
> block of memory whose size is provider-specific (and even probe 
> specific... tick-n needs one perf_event_open fd while profile-n needs 
> many).  So the provider needs to parse the probe name and allocate the 
> opaque pointer.  So, would you tolerate a provider-specific function 
> like probe_info() or attach() omitting "const" on dt_probe_t in order 
> that the provider can parse the probe name and allocate the opaque 
> pointer of appropriate size?  Or, would you prefer that 
> dt_probe_insert() take a new argument, the size of the memory to which 
> the opaque pointer points?  Or, would you like the opaque pointer to 
> point to another pointer so that provider-specific functions can 
> effectively change parts of the dt_probe_t because they are hidden 
> behind so many dereferencings? Actually, I guess I'm left wondering why 
> it's so important that the provider-specific functions declare a "const 
> dt_probe_t" in the first place.  You point out that it can be done, but 
> we would still be modifying probe info.... just hidden behind 
> dereferencings.  I do not understand why the const is so important.

Look at how identifier specific data is provided for in the dt_ident_t
structure.  That one demonstrates this design principle although that code
(in existance since the Solaris version) could do a better job in some
places. The probe-registering functions (populate(), provide()) would pass
an extra argument in dt_probe_insert() which is a pointer to a struct that
is only known within the provider implementation.  So, populate() and provide()
would allocate the provider-specific data structure, populate it, and pass
it along when dt_probe_insert() is called.

The main point behind the design is simply that dt_probe_t structures are
managed by the probe mgmt code in dt_probe.c.  Other code should not be
messing with the members of that structure.

A specific provider implementation can do whatever it wants with data that is
specific to the provider (and not used by any other code).

> On an older subject...  Going back to the issue of not creating probes 
> until "attach" time:  it actually seems to be possible for the providers 
> we currently have.  We do not need to create sdt or syscall kprobes at 
> all since they already exist.  For dtrace and fbt probes, we need to 
> create uprobes and kprobes, respectively, but not until we are going to 
> attach BPF programs to them.  (We do not need the probes to exist when 
> we call arg_info since the arg info does not depend on what we read in 
> the format file.)  For the profile provider, again the arg info does not 
> depend on creating any probe. So, if you really do not want to create 
> probes until attach time, then we can:
> 
> *)  sdt/syscall:  read the 
> /sys/kernel/debug/tracing/$group/$event/format file at arg_info time 
> without having to create any kprobes
> 
> *)  dtrace/fbt:  hardwire the arg info at arg_info time;  later, at 
> attach time, create the uprobe/kprobe, read the format file to get the 
> event_id, call perf_event_open() to get the event_fd, etc.
> 
> *)  profile:  hardwire the arg info at arg_info time;  later, at attach 
> time, call perf_event_open() to create the probe
> 
> That would adhere more strictly to the model you proposed (not creating 
> probes -- not even uprobes or kprobes -- until we're ready to attach BPF 
> programs to them).  I just don't know if it will work for all possible 
> future providers.

I am not saying we cannot create kprobes, uprobes, etc until attach time
although that would be the preferred option where possible.  Reason is that
we don't really want to be consuming system resources when e.g. we're simply
listing probes as opposed to actually performing tracing.  The main thing
we want to defer until attach() time is the perf_event_open() call.  Whether
we create kprobes, uprobes, etc at an earlier time isn't as big of a deal.
Given that the majority of existing providers depend on pulling information
from the format file in the tracefs, it makes sense to have fbt create kprobes
(and dtrace create uprobes) to ensure that file exists so we can process things
from that point on as regualr tracepoint-based probes.

If it is easier for you, I can put this framework in place based on the
existing providers and then you can work with that for the profile provider.



More information about the DTrace-devel mailing list