[DTrace-devel] [PATCH 11/13] Add a profile provider

Kris Van Hees kris.van.hees at oracle.com
Wed Jul 8 20:15:13 PDT 2020


On Wed, Jul 08, 2020 at 02:33:38PM -0700, Eugene Loh wrote:
> On 07/07/2020 12:07 AM, Kris Van Hees wrote:
> 
> > On Wed, Jul 01, 2020 at 10:41:16PM -0400, eugene.loh at oracle.com wrote:
> >> From: Eugene Loh <eugene.loh at oracle.com>
> >>
> >> The probes use perf_events with PERF_TYPE_SOFTWARE and
> >> PERF_COUNT_SW_CPU_CLOCK.
> >>
> >> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> >> ---
> >>   BPF-DESIGN                  |   1 +
> >>   libdtrace/Build             |   2 +-
> >>   libdtrace/dt_open.c         |   1 +
> >>   libdtrace/dt_prov_profile.c | 422 ++++++++++++++++++++++++++++++++++++
> >>   libdtrace/dt_provider.h     |   1 +
> >>   5 files changed, 426 insertions(+), 1 deletion(-)
> >>   create mode 100644 libdtrace/dt_prov_profile.c
> >>
> >> diff --git a/BPF-DESIGN b/BPF-DESIGN
> >> index 919a6d9d..1d1c3a2a 100644
> >> --- a/BPF-DESIGN
> >> +++ b/BPF-DESIGN
> >> @@ -221,6 +221,7 @@ DTRACE BPF PROGRAM CONVENTIONS
> >>   
> >>       BPF trampoline (probe specific)
> >>       -------------------------------
> >> +	FIXME: add something for dt_dtrace and dt_profile.
> > No need for this - this document needs updating in various places anyway.
> > We'll tackle that later (or more likely, the file will go away in favour of
> > actual documentation).
> 
> All of that is fine, but how about leaving that line there until that 
> magical day "in the future."

No, really, we shouldn't be bothering with that file.  The content is no
longer accurate and will be even less accurate (and almost obsolete) by the
time I am done with my code for the current errata.

> >>   	Function Boundary Tracing (based on kprobe)
> >>   	-------------------------------------------
> >>   	The C equivalent implementation of the FBT trampoline program is:
> >> diff --git a/libdtrace/dt_prov_profile.c b/libdtrace/dt_prov_profile.c
> >> @@ -0,0 +1,422 @@
> >> +typedef struct prv_data {
> >> +	int	ncpus;		/* number of CPUs on which a probe fires */
> >> +	int	*fds;		/* fds for the CPUs on which a probe fires */
> >> +} prv_data_t;
> > Better name would be struct profile_probe / profile_probe_t.
> >
> > I also think it would be more clear if the members are fds and fds_cnt or
> > something similar.  The fds is an array of file descriptors, and fds_cnt would
> > tell you how many there are.  The fact that this corresponds to the number of
> > CPUs that are online is (after the data has been initialized) not relevant
> > anymore.
> >
> > Of course, since there are two types of profile provider probes, perhaps it
> > would suffice to just store a kind (profile or tick) and the fds array.  If it
> > is a kind == TICK probe, fds will contain just a singleton element, and if
> > it is kind == PROFILE, fds will contain as many elements as there are CPUs on
> > the system (online).  Since that number does not change during the execution
> > of DTrace for a specific session, there is no need to store it here.
> 
> I cannot tell if you are leaving the choice to me or are indicating 
> which choice you want.  For now, I'll assume you're leaving the choice 
> to me;  so I'll go with fds_cnt and fds.

I am sorry I was confusing on how I wrote my comments.  I was thinking as I
was writing and probably should have revised the text here to reflect my final
thought.  I believe we do need a datap with the following members:

kind (tick or profile)
fds[] (file descriptor array - 1 element when kind == tick, and ncpu elements
       when kind == profile)
period (time between firings)

Note that period should be sufficient because you should always be able to
convert a freq into a period.

I think the 'kind' member will be needed to conveniently know whether this is
a tick or profile probe.  The distinction is important because they provide
different arguments to the probe.  (It would be possible to use fds_cnt as an
indiation of what kind probe it is but that is kind of doing things backwards
because it is the kind of probe that defines whether we run on 1 CPU or on all
CPUs).

> >> +static const char		prvname[] = "profile";
> >> +static const char		modname[] = "";
> >> +static const char		funname[] = "";
> >> +
> >> +#define PREFIX_PROFILE	"profile-"
> >> +#define PREFIX_TICK	"tick-"
> >> +
> >> +
> >> +static int provide(dtrace_hdl_t *dtp, const dtrace_probedesc_t *pdp)
> >> +{
> >> +	struct perf_event_attr attr;
> >> +
> >> +	/* make sure we have IDNONE and a legal name */
> >> +	if (pdp->id != DTRACE_IDNONE ||
> >> +	    strcmp(pdp->prv, prvname) ||
> >> +	    strcmp(pdp->mod, modname) ||
> >> +	    strcmp(pdp->fun, funname))
> >> +		return 0;
> >> +	if (strncmp(pdp->prb, PREFIX_TICK, strlen(PREFIX_TICK)) &&
> >> +	    strncmp(pdp->prb, PREFIX_PROFILE, strlen(PREFIX_PROFILE)))
> >> +		return 0;
> >> +
> >> +	/* return if we already have this probe */
> >> +	if (dt_probe_lookup(dtp, pdp))
> >> +		return 0;
> >> +
> >> +	/* enforce the 200-usec limit */
> >> +	if (set_period_freq(pdp->prb, &attr) < 0 ||
> >> +	    (attr.freq == 1 && attr.sample_freq > 5000) ||
> >> +	    (attr.freq == 0 && attr.sample_period < 200000))
> >> +		return 0;
> > Since you did all the work here already to determine the sample_freq and
> > sample_period, why not store that in profile_probe_t so that you don't have
> > to repeat the parsing when the time comes to attach the probe?  That is the
> > kind of data that would make sense to store in the provider-specific data
> > struct.
> 
> I do not know if that's a question or a directive.  I'll assume it's a 
> question.
> 
> For "populate()" probes, there is no need to parse the probe name until 
> and unless the probe is being used;  so, the probe name is parsed during 
> attach().
> 
> I thought something similar for "provide()" probes, but then realized we 
> had a documented limitation on oversampling.  It was easy enough to 
> stick those few lines you see above to implement that limitation.  It 
> seems to me that it's no big deal "to repeat the parsing" -- the code is 
> small, the run-time cost inconsequential, and the overall complexity 
> less than the proposed alternative.
> 
> What are you concerned about here?  A run-time cost?  What experiment 
> could one perform to demonstrate that a proposed solution achieves the 
> desired benefit?

My point is simply that using set_period_freq() in two places for the same
probe is unnecessary.  Determining the period (if there is a freq you can
calculate the period that represents that freq) can be done at the provide()
stage, and the value be stored in datap.  Then when the probe is actually
created in the attach, the period value from the datap can be used to set
the sample period.  No need to use set_period_freq() twice.

The point is really prinarily to not do twice what can be done once.

> >> +	/* insert this probe */
> >> +	profile_probe_insert(dtp, pdp->prb);
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> >> +{
> >> +	prv_data_t *datap = prp->prv_data;
> >> +	struct perf_event_attr attr;
> >> +	int i, nattach = 0;;
> >> +
> >> +	memset(&attr, 0, sizeof(attr));
> >> +	attr.type = PERF_TYPE_SOFTWARE;
> >> +	attr.config = PERF_COUNT_SW_CPU_CLOCK;
> >> +	attr.sample_type = PERF_SAMPLE_RAW;
> >> +	attr.size = sizeof(struct perf_event_attr);   // FIXME: why have we not been doing this in other providers?
> >> +	attr.wakeup_events = 1;
> >> +
> >> +	if (set_period_freq(prp->desc->prb, &attr) < 0)
> >> +		return -1;
> > See comment above about sample_freq and sample_period belonging in the
> > profile_probe_t struct.
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list