[DTrace-devel] [PATCH v3 06/21] dtprobed: add the DOF stash

Kris Van Hees kris.van.hees at oracle.com
Wed Feb 14 14:52:43 UTC 2024


On Wed, Feb 14, 2024 at 01:53:42PM +0000, Nick Alcock wrote:
> On 14 Feb 2024, Kris Van Hees told this:
> 
> > On Mon, Feb 12, 2024 at 10:00:37PM +0000, Nick Alcock wrote:
> >> On 12 Feb 2024, Kris Van Hees spake thusly:
> >> 
> >> > Initial comments (mostly on the comments, that are pretty crucial for people
> >> > to understand what is going on here).
> >> 
> >> ... here's the new top-of-file comment block, which hopefully make a bit
> >> more sense now:
> >> 
> >> /*
> >>  * Oracle Linux DTrace; DOF state storage management.
> >>  * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved.
> >>  * Licensed under the Universal Permissive License v 1.0 as shown at
> >>  * http://oss.oracle.com/licenses/upl.
> >>  *
> >>  * The DOF stash is the principal state preservation mechanism used by dtprobed.
> >>  * Keeping DOF on disk keeps memory management complexity low, makes it
> >>  * possible for dtprobed to track DOF across daemon restarts and upgrades,
> >>  * and gives the rest of dtrace a place to acquire the DOF contributed by
> >>  * processes on the system for USDT probe operations.
> >>  *
> >>  * The state storage structure is as follows:
> >>  *
> >>  * /run/dtrace/stash: Things private to dtprobed.
> >
> > Perhaps use a trailing / for all entries here and below that are directories,
> > so it is immediately obvious what are files and what are directories?
> 
> Ack! good idea, particularly given the mistakes I'd already made there :)
> 
> >>  *
> >>  *    .../dof/$dev-$ino: DOF contributed by particular mappings, in raw form
> >>  *    (as received from some probe-containing program).
> >
> > I would think that this is a directory that contains:
> >
> > 	   ../raw				- raw DOF
> > 	   ../parsed				- parsed DOF
> > 	   ../parsed/$prv:$mod:$fun:$prb	- per-probe parsed DOF
> >
> > because this data is not task (or PID) specific.  Granted, dtrace will use
> > a modified provider name ($prv$pid) when it creates probes for a specific task,
> > but at this level there is nothing task specific in this data.
> 
> I did that at first, but it persisted in not working.
> 
> Alas, it turns out the output of the parser *is* PID-specific despite
> the DOF being the same for every mapping of a given, uh, mapping,
> because the parsing process depends on the dof_helper_t :( in
> particular, the dofhp_addr is address-space-specific, so more or less
> process-specific. For now, I think this is the best we can do. (In
> future, maybe we can pass the dofhp_addr on along with the relocs it is
> used with, and relocate the raw DOF later on in dtrace proper, but for
> now parsing isn't so expensive that a parse per ioctl is likely to be a
> performance problem. We can usually parse many DOFs per parser process,
> in a stream...)

I am not following this...  The parser can surely be adjusted to not resolve
the relocations (because we are also storing, per pid, the data needed to do
that relocation when we actually need it - in dtrace itself).  And that makes
it that we can resue the parsed form without any problem, right?

> >>  *
> >>  *    .../dof-pid/$pid/$dev-$ino: contains everything relating to DOF
> >>  *    contributed by a particular USDT-containing ELF object within a given
> >>  *    process (henceforth "DOF source").  Each DTRACEHIOC_ADDDOF ioctl call
> >>  *    creates one of these.
> >
> > I would make this a hardlink to .../dof/$dev-$ino.
> 
> ... you just said you wanted this to be a directory :) can't hardlink
> those, which breaks the "use the link count" refcounting method I'm
> using: so I'd need something else, probably another per-mapping file
> under dof-pid hardlinked to something (the raw DOF?) under the $dev-$ino
> directory. However...

That would work.

> > Then I would add a file:
> >
> > 	.../dof-pid/$pid/$dev-$ino.dh
> >
> > which contains the dof_helper_t data for that mapping in this particular task
> > (by pid).  Together with the data in .../dof-pid/$pid/$dev-$ino (which is
> > .../dof/$dev-$ino), you have enough to identify everything we need to know
> > about the USDT probes in a particular task.
> 
> ... in the world you thought we were in, in which parsing is not
> affected by per-process state, that is practical. In this one, alas no.
> I had something similar to this design originally, but it doesn't work
> :(
> 
> [snipped bits which are only relevant if the redesign above went
>  through, which alas... I wish.]

Again, I do not see where the problem lies.

> >>  *
> >>  * /run/dtrace/probes: Per-probe info, written by dtprobed, read by DTrace.
> >>  *
> >>  *    .../$pid/$prv$pid/$mod/$fun/$prb: Hardlink from $prv$pid:$mod:$fun:$prb
> >>  *    above; parsed representation of one probe in a given process. Removed by
> >>  *    dtprobed when the process dies, or if all mappings containing the probe
> >>  *    are unmmapped.  Used by DTrace for tracing by PID.
> >
> > Why not use .../$pid/$prv/$mod/$fun/$prb?  Since you are using globbing to
> > search for probe name matches, it doesn't matter where the pid is placed, and
> > there is no need to have it listed twice.  DTrace can still register the probe
> > internally as $prv$pid after its definition has been found.
> 
> Because the provider name DTrace is dealing with already has the PID as
> part of the name, and slicing them in half again would prevent us from
> ever putting USDT probes in programs with numbers at the end of their
> names (something which is very common! I have 300+ of these in /usr/bin
> here.)

Why does this matter?  You are not slicing anything in half that has to do
with the exectable name because provider name != executable name.  The only
issue is where the provider name ends in numeric digits so this would already
have been an issue with DTrace overall?

If you look at the code in dt_pid.c, there is clearly a built-in assumption
that provider names (at least in the pid / usdt case) cannot end in a numeric
digit, because dt_pid_get_pid) determines the pid to be the whole of all
digits at the right end of the provider name.

> (And we need the pid to be separate because there are things we need to
> do when processes go away which require us to remove the entire
> directory, and it's far easier to do that if they're all grouped
> together properly under one dir.)
> 
> -- 
> NULL && (void)



More information about the DTrace-devel mailing list