[DTrace-devel] [PATCH] Improve dt_pid_create_usdt_probes() error handling
Kris Van Hees
kris.van.hees at oracle.com
Mon Jan 6 19:13:43 UTC 2025
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
... although I think that the problem is (in part) on the design side of
USDT handling, where we do a bit too much work at this point of probe
discovery, when we do not even know whether we will be using those probes.
But that is subject matter for a more extensive reworking of the code.
This patch seems sufficient to deal with the existing problem of DTrace
being affected by unruly processes.
On Sun, Dec 01, 2024 at 11:27:41PM -0500, eugene.loh at oracle.com wrote:
> From: Eugene Loh <eugene.loh at oracle.com>
>
> In one run of the test suite, roughly 90 consecutive tests failed.
> The exact circumstances are difficult to reconstruct, but basically
> they complained with messages like:
>
> dtrace: failed to compile script ...: failed to grab process 23132
>
> The same pid was always indicated. Then, the errors stopped.
>
> Apparently, the problem was trying "dtrace -s foo.d" where the script
> has a BEGIN (or END) probe. Then we get to
>
> cmd/dtrace.c compile_file
> -> libdtrace/dt_cc.c dt_program_compile
> -> libdtrace/dt_cc.c dt_compile
> -> libdtrace/dt_cc.c dt_compile_one_clause
> -> libdtrace/dt_cc.c dt_setcontext
> -> libdtrace/dt_pid.c dt_pid_create_usdt_probes
>
> We look for processes that might have USDT probes. Since the provider
> description is blank, we check every process in .../run/dtrace. It
> turned out, that pid 23132 could not be locked. This sent an error
> back up the call stack.
>
> We do not know why process 23132 could not be locked, but having dtrace
> fail under these circumstances (using BEGIN or END probe) seems severe.
>
> Change dt_pid_create_usdt_probes() error handling. Even if there is a
> problem with some USDT processes, report success anyhow if there was a
> glob pid specification. If, on the other hand, a pid was specifically
> requested, then a problem with that pid results in an error.
>
> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> ---
> libdtrace/dt_pid.c | 35 +++++++++++++++++++++++++++++++++--
> 1 file changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/libdtrace/dt_pid.c b/libdtrace/dt_pid.c
> index e0a26d5aa..fd94a0706 100644
> --- a/libdtrace/dt_pid.c
> +++ b/libdtrace/dt_pid.c
> @@ -1252,7 +1252,8 @@ dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *
> DTRACE_PROC_SHORTLIVED) < 0) {
> dt_pid_error(dtp, pcb, NULL, D_PROC_GRAB,
> "failed to grab process %d", (int)pid);
> - return -1;
> + err = 1; // FIXME but do we want to set the error if we end up return 0?
> + continue;
> }
> dpr = dt_proc_lookup(dtp, pid);
> assert(dpr != NULL);
> @@ -1272,7 +1273,37 @@ dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *
> free(globpat);
> globfree(&globbuf);
>
> - return err ? -1 : 0;
> + /* If no errors, report success. */
> + if (err == 0)
> + return 0;
> +
> + /* If provider description was blank, report success. */
> + if (pdp->prv[0] == '\0')
> + return 0;
> +
> + /* Look to see if the provider description had a pid glob. */
> + for (i = strlen(pdp->prv) - 1; i >= 0; i--) {
> + /*
> + * If we hit a '*' before a nondigit, we have a pid glob.
> + * So, even though err==0, we declare success.
> + */
> + if (pdp->prv[i] == '*')
> + return 0;
> +
> + /*
> + * If we hit a nondigit before a '*', we do not have a pid glob.
> + * Since a pid was specified explicitly, err==1 means an error.
> + */
> + if (!isdigit(pdp->prv[i]))
> + return -1;
> + }
> +
> + /*
> + * If the provider description was exclusively digits,
> + * it was not a legitimate USDT provider description.
> + * So it makes perfect sense not to return any probes.
> + */
> + return 0;
> }
>
> int
> --
> 2.43.5
>
More information about the DTrace-devel
mailing list