[DTrace-devel] the whole EPID thing

Fri Aug 30 03:14:10 UTC 2024

On Thu, Aug 29, 2024 at 08:37:38PM -0400, Eugene Loh wrote:
> If I understand correctly, an epid is supposed to be an integer that
> uniquely identifies both probe ID and statement/clause.  The consumer
> reports a problematic data record using this integer, whose value matches
> that of the built-in epid variable.

It depends on the perspective you take: user vs implementation.

>From the user perspective it is per the documentation):
  "The enabled probe ID (EPID) for the current probe. This integer uniquely
   identifies a particular probe that is enabled with a specific predicate and
   set of actions."

The user cannot know how that is implemented aside from the fact that there is
some relation to the probe id.  While it also identifies the particular clause
as it relates to a particualr probe, the user does not really have any way to
identify clauses other than perhaps by the order they apper in the tracing
script.  Nothing defines how that order relataes to the EPID though.

The user can obtain the value of the EPID using the epid built-in variable.

The only other place that it is user visible (except for custom consumer code)
is in the ERROR probe where it is displayed (and is essentially useless) and
used to derive the probe ID from it.

>From the implementation point of view, the EPID has been used as index in an
array that relates the EPID to a probe id and a data description.  It is also
used in the flowindent implementation but that is based on a simple comparison
of values so any value that preserves a 1-to-1 relation to what it represents
would suffice.

> Is the proposal to abandon such an integer?

Well, we cannot abandon it since it is a documented feature (albeit rather
obscure and of doubtful use).  But the implementation changes related to USDT
clearly move us to a state where the implementation perspective is no longer
relevant.  By passing the probe ID explicitly in the output buffer, there is
no need for the implementation-size of EPID.  We do however need to identify
the datadesc for the trace record and we can do that with an integer id that
refers to the statement (clause) the data record was created by.  So, from the
implementation side we really are moving to probe ID and statement ID (for
lack of a better name right now).

But again, we cannot just get rid of EPID since there is a user perspective.
But nothing defines what how that value relates to anything, as long as it is
a unique identifier for any (probe ID, statement ID) pair.

> If so, I guess the implications are:
> 
> *)  Change error reporting (say, to list probe ID and statement/clause
> independently, but no epid).

There are two options here...  We can keep the error reporting and report the
calculated epid value (see below), but that seems rather useless and results
in excessively large values.  Or we can change the error reporting to be more
useful, and report probe ID and statement ID.  If done well, that could even
be extended by reporting the identifier for the statement so that it can more
easily be related to disassembler output.

> *)  Do away with the built-in epid variable.

As explained above, we cannot do that.  But we can easily generate it as:
((probe-ID << 32) | statement-ID).