[DTrace-devel] [PATCH 3/3] add BEGIN and END probes

Thu Apr 2 23:54:44 PDT 2020

This discussion should have been on the dtrace-devel list but it got started
as a side-question in a direct email, and therefore ended up in private
communication instead.

This reply is on-list, so further replies will go there as well.

On Thu, Apr 02, 2020 at 10:29:03PM -0700, Eugene Loh wrote:
> More below.
> 
> On 04/02/2020 08:04 PM, Kris Van Hees wrote:
> 
> >On Thu, Apr 02, 2020 at 07:29:12PM -0700, Eugene Loh wrote:
> >>On 04/02/2020 11:52 AM, Kris Van Hees wrote:
> >>>I do think we might want to rework the cleanup a tiny bit where we make
> >>>use of the fact that we already know what probes belong to which provider so
> >>>that you do not have to do a lookup by provider name and a fixed list (which
> >>>will not work for other kprobe and uprobe based probes) or some other way to
> >>>perform lookups to find all the probes.
> >>Sorry, I'm really scratching my head on this one.  Why would we want to loop
> >>over every probe?  The situation today is that there are only two probes
> >>that need clean up:  BEGIN and END.  Someday, some other probes might also.
> >>I guess it might depend on provider.  But well fewer than 1% of the probes
> >>need cleanup, and so looping over all probes makes no sense to me.  Not even
> >>for all the probes within the dtrace provider... e.g., at this point, we do
> >>not need to clean up ERROR, and if we will we do not yet know in any case
> >>what that will mean.  The fixed list works for BEGIN and END.  No other
> >>provider is forced to do anything similar;  in fact, the whole point is that
> >>each provider will do what makes sense for it.  Indeed, at the moment, most
> >>providers define their cleanup functions to be NULL.
> >We probably want to implement cleanup for FBT probes (kprobes) as well, since
> >we could potentially be creating a large amount of them and never clean them
> >up.
> 
> Yes, though that seems a departure from what I thought you were doing (leave
> those kprobes around because someone else might want to use them) and
> outside the scope of this particular patch (BEGIN/END).  If you want, just
> give the word and I can prepare an fbt cleanup patch:  naming kprobes with a
> group name that mentions both "dtrace" and the pid and adding cleanup code
> that loops over fbt's pv_probes to clean up.  Again, though, that seems to
> be separate from this particular commit.  And this commit should not loop
> over dtrace's pv_probes when we don't know how we're going to do ERROR.

I don't think I'm ready yet to do cleanup of FBT probes pending further digging
into whether we ought, and whether the kernel actually handles sharing of a
perf event nicely for multiple consumers.  But that is not relevant to this
discussion because other providers will need to do cleanup anyway.  That is in
part a generic functionality that therefore can be handled in non-provider
specific code as much as possible.  Hence my point to do the looping over the
probes for a given provider in generic code, and isolating the provider-specific
aspects in each provider.

The fact that we do not know yet what will happen with the ERROR probe is not
an issue because the cleanup function can ensure that it won't do any work for
the ERROR probe.  In fact, since the probe never gets an event_fd associated
with it, doing:

	 if (event_fd == -1)
		return;

would be enough to handle the ERROR probe.  Hardly an issue.  And the code
does not require any knowledge of probe names as constants.

> >Now, that may not be that big of a deal, but there is a fair chance that
> >subsequent tracing sessions will not use the same probes (especially if more
> >than one user might be using dtrace on the system).  In the long run, we could
> >potentially end up with 10s of thousands probes left hanging (especially on a
> >system where someone executes the testsuite).  That seems pretty sloppy.  I am
> >still thinking about how to best handle this because it is a situation that is
> >somewhat different from uprobes, but then again, as far as underlying code in
> >the kernel (at the tracing leel) it is largely the same.
> >
> >>I'm guessing here, but maybe what you're saying is that if we're iterating
> >>over all the probes in the dtrace provider, then we can just iterate over
> >>dtrace's pv_probes.  But we're not iterating over every probe -- certainly
> >>not for all providers but not even within the dtrace provider.  We're
> >>cleaning up the BEGIN probe.  We're cleaning up the END probe.  That's it.
> >>Two probes.
> >We would never iterate over all the probes, since that would be quite
> >ridiculous indeed.  However, since we have hash tables to manage the probe
> >lists already, we can handle this by performing a hashtable lookup for the
> >provider that needs cleaning up (by name), and then loop through the list of
> >probes attached to that provider.  We already have code for that.  That list
> >will be the probes offered by that specific provider.  You can then call a
> >probe destroy function in the provider to do the provider specific cleanup for
> >that provider.
> >
> >And indeed, you are not cleaning up the ERROR probe (right now), but that is a
> >decision that can still be handled at the provider level inside the probe
> >destroy function.  No issue there.
> >
> >But surely making use of the data structures we already have is better than
> >having multiple places in code where a static list of probe names is encoded?
> 
> Well, what we have is a BEGIN probe and an END probe.  The provider also has
> an ERROR probe, but we do not yet know how we're going to handle that.  So,
> these static lists are very short (2-3 items, they are hardly lists at all)
> and the lists are different in different places:  BEGIN/END/ERROR in
> populate, BEGIN/END in a cleanup function, and possibly (TBD) ERROR in
> another cleanup function.

Populate has a list because that is where the probes are created.  From that
point on we can do everything based on the data structures present in dtrace
itself, and we shouldn't need any further knowledge of what static list of
probe names may exist or not (e.g. the profile provider will have some static
ones and some dynamic ones).  By making the cleanup function a function that is
called for a specific probe we implement cleanup of probes in a much more
generic way which will benefit other providers as they need cleanup added.

Implementing this at a higher level (outside of a specific provider) and only
delegating the actual cleanup work is also the way this was originally designed
in DTrace.  And for good reasons - I truly think we should stick with it
because it is a good design in function of the differing needs of various
providers combined with the common need to have cleanup called for all probes
belonging to a specific provider.

> >And again, there will be more cases where we need to do probe cleanup, e.g.
> >when we have USDT and pid provider probes.
> 
> Each provider can and should do what makes sense for it.

Yes, and that can still be done at the level of the actual probe cleanup code
implementation.  Knowing what probes are handled by a specific provider is
available in the probe management code already and so there is no need to add
code to have a provider figure that out for itself.  While the dtrace provider
only has 3 probes to worry about, other providers have a lot more - if we can
avoid hardcoding things and instead use the information already available, we
are better off in the long run.

> >>Sorry if I'm missing the point of what you're trying to say, but at least I
> >>hope what I'm saying helps explain what I hear you to be saying and why I'm
> >>confused what you're proposing.
> >>
> >>>We can do cleanup from code in the
> >>>dt_provider.c source file and have it call a cleanup operation function for
> >>>each probe to allow provider-specific cleanup.  In other words, go through the
> >>>list of probes attached to a specific provider and call cleanup for each one
> >>>of them - that puts the loop in the generic provider handling code, and only
> >>>the cleanup is specific.
> >>>
> >>>On Fri, Mar 27, 2020 at 01:21:03AM -0400, eugene.loh at oracle.com wrote:
> >>>>From: Eugene Loh <eugene.loh at oracle.com>
> >>>>
> >>>>Add BEGIN and END probes as uprobes on libdtrace functions
> >>>>BEGIN_probe() and END_probe(), which were added as part of
> >>>>an earlier patch.
> >>>>
> >>>>By default, uprobes belong to group "uprobes", but we introduce
> >>>>a new group, "dtrace_$pid", with probes that are specific to our
> >>>>consumer process.  Specifically, this means that multiple consumers,
> >>>>with distinct or changing libdtrace, can coexist.  We try to clean
> >>>>up our BEGIN and END probes at close, and the sdt provider must
> >>>>recognize that dtrace_$pid is a group to ignore.
> >>>>
> >>>>diff --git a/libdtrace/dt_prov_dtrace.c b/libdtrace/dt_prov_dtrace.c
> >>>>+static void dtrace_prov_cleanup(dtrace_hdl_t *dtp)
> >>>>+{
> >>>>+	char *probes[] = { "BEGIN", "END" };
> >>>>+	dtrace_probedesc_t pdesc;
> >>>>+	int i, fd;
> >>>>+
> >>>>+	/* start constructing a probe description */
> >>>>+	pdesc.id = DTRACE_IDNONE;
> >>>>+	pdesc.prv = "dtrace";
> >>>>+	pdesc.mod = "";
> >>>>+	pdesc.fun = "";
> >>>>+
> >>>>+	/* clear the BEGIN and END uprobes */
> >>>>+	fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
> >>>>+	if (fd == -1)
> >>>>+		return;
> >>>>+	for (i = 0; i < sizeof(probes) / sizeof(*probes); i++) {
> >>>>+		dt_probe_t	*prb;
> >>>>+
> >>>>+		/* close the fd if possible */
> >>>>+		pdesc.prb = probes[i];
> >>>>+		prb = dt_probe_lookup(dtp, &pdesc);
> >>>>+		if (prb && prb->event_fd != -1) {
> >>>>+			close(prb->event_fd);
> >>>>+			prb->event_fd = -1;
> >>>>+		}
> >>>>+
> >>>>+		/* clear the UPROBE_EVENTS entry */
> >>>>+		dprintf(fd, "-:dtrace_%d/%s\n", getpid(), probes[i]);
> >>>>+	}
> >>>>+	close(fd);
> >>>>+}