[DTrace-devel] [PATCH 11/13] Add a profile provider

Eugene Loh eugene.loh at oracle.com
Wed Jul 8 14:33:38 PDT 2020


On 07/07/2020 12:07 AM, Kris Van Hees wrote:

> On Wed, Jul 01, 2020 at 10:41:16PM -0400, eugene.loh at oracle.com wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> The probes use perf_events with PERF_TYPE_SOFTWARE and
>> PERF_COUNT_SW_CPU_CLOCK.
>>
>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>> ---
>>   BPF-DESIGN                  |   1 +
>>   libdtrace/Build             |   2 +-
>>   libdtrace/dt_open.c         |   1 +
>>   libdtrace/dt_prov_profile.c | 422 ++++++++++++++++++++++++++++++++++++
>>   libdtrace/dt_provider.h     |   1 +
>>   5 files changed, 426 insertions(+), 1 deletion(-)
>>   create mode 100644 libdtrace/dt_prov_profile.c
>>
>> diff --git a/BPF-DESIGN b/BPF-DESIGN
>> index 919a6d9d..1d1c3a2a 100644
>> --- a/BPF-DESIGN
>> +++ b/BPF-DESIGN
>> @@ -221,6 +221,7 @@ DTRACE BPF PROGRAM CONVENTIONS
>>   
>>       BPF trampoline (probe specific)
>>       -------------------------------
>> +	FIXME: add something for dt_dtrace and dt_profile.
> No need for this - this document needs updating in various places anyway.
> We'll tackle that later (or more likely, the file will go away in favour of
> actual documentation).

All of that is fine, but how about leaving that line there until that 
magical day "in the future."

>>   	Function Boundary Tracing (based on kprobe)
>>   	-------------------------------------------
>>   	The C equivalent implementation of the FBT trampoline program is:
>> diff --git a/libdtrace/dt_prov_profile.c b/libdtrace/dt_prov_profile.c
>> @@ -0,0 +1,422 @@
>> +typedef struct prv_data {
>> +	int	ncpus;		/* number of CPUs on which a probe fires */
>> +	int	*fds;		/* fds for the CPUs on which a probe fires */
>> +} prv_data_t;
> Better name would be struct profile_probe / profile_probe_t.
>
> I also think it would be more clear if the members are fds and fds_cnt or
> something similar.  The fds is an array of file descriptors, and fds_cnt would
> tell you how many there are.  The fact that this corresponds to the number of
> CPUs that are online is (after the data has been initialized) not relevant
> anymore.
>
> Of course, since there are two types of profile provider probes, perhaps it
> would suffice to just store a kind (profile or tick) and the fds array.  If it
> is a kind == TICK probe, fds will contain just a singleton element, and if
> it is kind == PROFILE, fds will contain as many elements as there are CPUs on
> the system (online).  Since that number does not change during the execution
> of DTrace for a specific session, there is no need to store it here.

I cannot tell if you are leaving the choice to me or are indicating 
which choice you want.  For now, I'll assume you're leaving the choice 
to me;  so I'll go with fds_cnt and fds.

>> +static const char		prvname[] = "profile";
>> +static const char		modname[] = "";
>> +static const char		funname[] = "";
>> +
>> +#define PREFIX_PROFILE	"profile-"
>> +#define PREFIX_TICK	"tick-"
>> +
>> +
>> +static int provide(dtrace_hdl_t *dtp, const dtrace_probedesc_t *pdp)
>> +{
>> +	struct perf_event_attr attr;
>> +
>> +	/* make sure we have IDNONE and a legal name */
>> +	if (pdp->id != DTRACE_IDNONE ||
>> +	    strcmp(pdp->prv, prvname) ||
>> +	    strcmp(pdp->mod, modname) ||
>> +	    strcmp(pdp->fun, funname))
>> +		return 0;
>> +	if (strncmp(pdp->prb, PREFIX_TICK, strlen(PREFIX_TICK)) &&
>> +	    strncmp(pdp->prb, PREFIX_PROFILE, strlen(PREFIX_PROFILE)))
>> +		return 0;
>> +
>> +	/* return if we already have this probe */
>> +	if (dt_probe_lookup(dtp, pdp))
>> +		return 0;
>> +
>> +	/* enforce the 200-usec limit */
>> +	if (set_period_freq(pdp->prb, &attr) < 0 ||
>> +	    (attr.freq == 1 && attr.sample_freq > 5000) ||
>> +	    (attr.freq == 0 && attr.sample_period < 200000))
>> +		return 0;
> Since you did all the work here already to determine the sample_freq and
> sample_period, why not store that in profile_probe_t so that you don't have
> to repeat the parsing when the time comes to attach the probe?  That is the
> kind of data that would make sense to store in the provider-specific data
> struct.

I do not know if that's a question or a directive.  I'll assume it's a 
question.

For "populate()" probes, there is no need to parse the probe name until 
and unless the probe is being used;  so, the probe name is parsed during 
attach().

I thought something similar for "provide()" probes, but then realized we 
had a documented limitation on oversampling.  It was easy enough to 
stick those few lines you see above to implement that limitation.  It 
seems to me that it's no big deal "to repeat the parsing" -- the code is 
small, the run-time cost inconsequential, and the overall complexity 
less than the proposed alternative.

What are you concerned about here?  A run-time cost?  What experiment 
could one perform to demonstrate that a proposed solution achieves the 
desired benefit?

>> +	/* insert this probe */
>> +	profile_probe_insert(dtp, pdp->prb);
>> +
>> +	return 0;
>> +}
>> +
>> +static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>> +{
>> +	prv_data_t *datap = prp->prv_data;
>> +	struct perf_event_attr attr;
>> +	int i, nattach = 0;;
>> +
>> +	memset(&attr, 0, sizeof(attr));
>> +	attr.type = PERF_TYPE_SOFTWARE;
>> +	attr.config = PERF_COUNT_SW_CPU_CLOCK;
>> +	attr.sample_type = PERF_SAMPLE_RAW;
>> +	attr.size = sizeof(struct perf_event_attr);   // FIXME: why have we not been doing this in other providers?
>> +	attr.wakeup_events = 1;
>> +
>> +	if (set_period_freq(prp->desc->prb, &attr) < 0)
>> +		return -1;
> See comment above about sample_freq and sample_period belonging in the
> profile_probe_t struct.



More information about the DTrace-devel mailing list