[DTrace-devel] [PATCH] Improve dt_pid_create_usdt_probes() error handling

Kris Van Hees kris.van.hees at oracle.com
Mon Jan 6 19:13:43 UTC 2025


Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>

... although I think that the problem is (in part) on the design side of
USDT handling, where we do a bit too much work at this point of probe
discovery, when we do not even know whether we will be using those probes.

But that is subject matter for a more extensive reworking of the code.

This patch seems sufficient to deal with the existing problem of DTrace
being affected by unruly processes.

On Sun, Dec 01, 2024 at 11:27:41PM -0500, eugene.loh at oracle.com wrote:
> From: Eugene Loh <eugene.loh at oracle.com>
> 
> In one run of the test suite, roughly 90 consecutive tests failed.
> The exact circumstances are difficult to reconstruct, but basically
> they complained with messages like:
> 
>     dtrace: failed to compile script ...: failed to grab process 23132
> 
> The same pid was always indicated.  Then, the errors stopped.
> 
> Apparently, the problem was trying "dtrace -s foo.d" where the script
> has a BEGIN (or END) probe.  Then we get to
> 
>     cmd/dtrace.c compile_file
>     ->  libdtrace/dt_cc.c dt_program_compile
>     ->  libdtrace/dt_cc.c dt_compile
>     ->  libdtrace/dt_cc.c dt_compile_one_clause
>     ->  libdtrace/dt_cc.c dt_setcontext
>     ->  libdtrace/dt_pid.c dt_pid_create_usdt_probes
> 
> We look for processes that might have USDT probes.  Since the provider
> description is blank, we check every process in .../run/dtrace.  It
> turned out, that pid 23132 could not be locked.  This sent an error
> back up the call stack.
> 
> We do not know why process 23132 could not be locked, but having dtrace
> fail under these circumstances (using BEGIN or END probe) seems severe.
> 
> Change dt_pid_create_usdt_probes() error handling.  Even if there is a
> problem with some USDT processes, report success anyhow if there was a
> glob pid specification.  If, on the other hand, a pid was specifically
> requested, then a problem with that pid results in an error.
> 
> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> ---
>  libdtrace/dt_pid.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/libdtrace/dt_pid.c b/libdtrace/dt_pid.c
> index e0a26d5aa..fd94a0706 100644
> --- a/libdtrace/dt_pid.c
> +++ b/libdtrace/dt_pid.c
> @@ -1252,7 +1252,8 @@ dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *
>  				      DTRACE_PROC_SHORTLIVED) < 0) {
>  			dt_pid_error(dtp, pcb, NULL, D_PROC_GRAB,
>  			    "failed to grab process %d", (int)pid);
> -			return -1;
> +			err = 1;  // FIXME but do we want to set the error if we end up return 0?
> +			continue;
>  		}
>  		dpr = dt_proc_lookup(dtp, pid);
>  		assert(dpr != NULL);
> @@ -1272,7 +1273,37 @@ dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *
>  	free(globpat);
>  	globfree(&globbuf);
>  
> -	return err ? -1 : 0;
> +	/* If no errors, report success. */
> +	if (err == 0)
> +		return 0;
> +
> +	/* If provider description was blank, report success. */
> +	if (pdp->prv[0] == '\0')
> +		return 0;
> +
> +	/* Look to see if the provider description had a pid glob. */
> +	for (i = strlen(pdp->prv) - 1; i >= 0; i--) {
> +		/*
> +		 * If we hit a '*' before a nondigit, we have a pid glob.
> +		 * So, even though err==0, we declare success.
> +		 */
> +		if (pdp->prv[i] == '*')
> +			return 0;
> +
> +		/*
> +		 * If we hit a nondigit before a '*', we do not have a pid glob.
> +		 * Since a pid was specified explicitly, err==1 means an error.
> +		 */
> +		if (!isdigit(pdp->prv[i]))
> +			return -1;
> +	}
> +
> +	/*
> +	 * If the provider description was exclusively digits,
> +	 * it was not a legitimate USDT provider description.
> +	 * So it makes perfect sense not to return any probes.
> +	 */
> +	return 0;
>  }
>  
>  int
> -- 
> 2.43.5
> 



More information about the DTrace-devel mailing list