[DTrace-devel] [PATCH 05/12] Add support for the stack() action

Eugene Loh eugene.loh at oracle.com
Tue Jun 8 22:21:37 PDT 2021


On 6/9/21 12:43 AM, Kris Van Hees wrote:

>   Fri, May 28, 2021 at 02:35:09PM -0400,eugene.loh at oracle.com  wrote:
>> From: Eugene Loh<eugene.loh at oracle.com>
>>
>> Implement the stack() action using the bpf_get_stack() helper
>> function.  This implementation most closely resembles the legacy
>> DTrace implementation.  Someday it may make sense to switch over
>> to the bpf_get_stackid() instead, which would allow consolidation
>> of like stacks.
>>
>> The max stack size can be controlled by the kernel parameter
>> perf_event_max_stack, and we would like to allow users to increase
>> that limit.  We will eventually need to know this limit in various
>> places.  So, add a dtp->dt_maxframes value.
> It seems that the max frames setting should perhaps be an option, just
> like various other tunables?

I can imagine that, but that would be a new option and I would see it 
being a separate patch.  If you really want it in this patch, just let 
me know.

> I could see a potential use for setting the
> maxframes value to be less than the system max (e.g. to avoid wasting space
> in buffers because I know I won't need the max anyway).  So, the default
> would be the perf_event_max_stack value, but it would be something the user
> can override with a pragma or -x option.  Then you an access the value just
> like any other option, and you do not need to add another member to the
> dtrace_hdl_t struct.
>
> Since there is a stackframes option, perhaps use maxstackframes for this?
>
>> Change the stack() testing.  Specifically, replace the default
>> test since BEGIN no longer has a kernel stack.
>>
>> Signed-off-by: Eugene Loh<eugene.loh at oracle.com>
>> ---
>>   libdtrace/dt_cg.c                            | 69 +++++++++++++++++---
>>   libdtrace/dt_consume.c                       | 21 ++++--
>>   libdtrace/dt_impl.h                          |  1 +
>>   libdtrace/dt_open.c                          |  6 ++
>>   test/unittest/stack/tst.default.d            | 21 ------
>>   test/unittest/stack/tst.default.r            |  1 -
>>   test/unittest/stack/tst.default.r.p          |  5 --
>>   test/unittest/stack/tst.stack3_fbt.aarch64.r | 11 ++++
>>   test/unittest/stack/tst.stack3_fbt.d         | 25 +++++++
>>   test/unittest/stack/tst.stack3_fbt.x86_64.r  | 11 ++++
>>   test/unittest/stack/tst.stack_fbt.aarch64.r  | 14 ++++
>>   test/unittest/stack/tst.stack_fbt.d          | 25 +++++++
>>   test/unittest/stack/tst.stack_fbt.x86_64.r   | 13 ++++
>>   13 files changed, 182 insertions(+), 41 deletions(-)
>>   delete mode 100644 test/unittest/stack/tst.default.d
>>   delete mode 100644 test/unittest/stack/tst.default.r
>>   delete mode 100755 test/unittest/stack/tst.default.r.p
>>   create mode 100644 test/unittest/stack/tst.stack3_fbt.aarch64.r
>>   create mode 100644 test/unittest/stack/tst.stack3_fbt.d
>>   create mode 100644 test/unittest/stack/tst.stack3_fbt.x86_64.r
>>   create mode 100644 test/unittest/stack/tst.stack_fbt.aarch64.r
>>   create mode 100644 test/unittest/stack/tst.stack_fbt.d
>>   create mode 100644 test/unittest/stack/tst.stack_fbt.x86_64.r
>>
>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>> index f7168ffe..e7715aec 100644
>> --- a/libdtrace/dt_cg.c
>> +++ b/libdtrace/dt_cg.c
>> @@ -1333,21 +1333,72 @@ dt_cg_act_speculate(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind)
>>   static void
>>   dt_cg_act_stack(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind)
>>   {
>> -	dt_node_t *arg = dnp->dn_args;
>> -#ifdef FIXME
>> -	uint32_t nframes = 0;
>> -#endif
>> +	dtrace_hdl_t	*dtp = pcb->pcb_hdl;
>> +	dt_irlist_t	*dlp = &pcb->pcb_ir;
>> +	dt_regset_t	*drp = pcb->pcb_regs;
>> +	dt_node_t	*arg = dnp->dn_args;
>> +	int		nframes = dtp->dt_options[DTRACEOPT_STACKFRAMES];
>> +	int		skip = 0;
>> +	uint_t		off;
>> +	uint_t		lbl_valid = dt_irlist_label(dlp);
>> +
>> +	/*
>> +	 * Legacy default was dtrace_stackframes_default,
>> +	 * in kernel file dtrace/dtrace_state.c.
>> +	 */
>> +	if (nframes == DTRACEOPT_UNSET)
>> +		nframes = 20;
> Yuck - an arbitrary constant inline in code.  This should be a define
> somewhere so you can use a symbolic name here..

Okay.  Just seemed to make no sense to define a symbol that is used 
exactly once.   The rationale would be that it makes it clear what the 
number is, but the code above does just that:  it sets something to a 
specific numerical value explaining that value.  Splitting this over 
different files would seem to obfuscate things, but let me know if you 
want it that way.

>
>>   	if (arg != NULL) {
>> -		if (!dt_node_is_posconst(arg)) {
>> +		if (!dt_node_is_posconst(arg))
>>   			dnerror(arg, D_STACK_SIZE, "stack( ) argument #1 must "
>>   				"be a non-zero positive integer constant\n");
>> -		}
>>   
>> -#ifdef FIXME
>> -		nframes = (uint32_t)arg->dn_value;
>> -#endif
>> +		nframes = arg->dn_value;
>>   	}
>> +
>> +	/*
>> +	 * FIXME:
>> +	 * What should happen if more frames are requested than allowed?
>> +	 *   1.  Silently reduce nframes?
>> +	 *   2.  Report message and reduce nframes?
>> +	 *   3.  Report message and abort?
>> +	 * Option 1 is the easiest and what we do here.
>> +	 *
>> +	 * A message could explain that the limit can be raised:
>> +	 *         # sysctl kernel.perf_event_max_stack=<new value>
>> +	 * See bpf_get_stack() info in /usr/include/linux/bpf.h.
>> +	 *
>> +	 * Note that DTv1 allows much larger stacks but does not handle
>> +	 * the "too large" case very well:
>> +	 *         # dtrace -qn 'BEGIN {stack(1234567); exit(0)}'
>> +	 *         dtrace: 1 drop on CPU 2
>> +	 *         [...hang...]
>> +	 */
> No need for this comment block.  Just describe that if the requested number
> of frames is higher than the max, the limit will be reduced to the max (or
> something like that).  That is the behaviour bpf_get_stack() provides as well.
>
>> +	if (nframes > dtp->dt_maxframes)
>> +		nframes = dtp->dt_maxframes;
>> +
>> +	/* Figure out where we want to be in the output buffer. */
> I think it would be more accurate to mention in the comment that this call
> reserves space in the output buffer since that is what it does.
>
>> +	off = dt_rec_add(dtp, dt_cg_fill_gap, DTRACEACT_STACK,
>> +			 8 * nframes, 8, NULL, nframes);
> I really prefer to see sizeof(uint64_t) instead of 8.  Yes, we know it is 8,
> but using the sizeof() at least tells us what that value means.
>
>> +
>> +	/* Now call bpf_get_stack(ctx, buf, size, flags). */
>> +	if (dt_regset_xalloc_args(drp) == -1)
>> +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_CTX));
>> +	emit(dlp,  BPF_MOV_REG(BPF_REG_2, BPF_REG_9));
>> +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, off));
>> +	emit(dlp,  BPF_MOV_IMM(BPF_REG_3, 8 * nframes));
> See comment above about '8'.
>
>> +	emit(dlp,  BPF_MOV_IMM(BPF_REG_4, skip & BPF_F_SKIP_FIELD_MASK));
>> +	dt_regset_xalloc(drp, BPF_REG_0);
>> +	emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_get_stack));
>> +	dt_regset_free_args(drp);
>> +	emit(dlp,  BPF_BRANCH_IMM(BPF_JSGE, BPF_REG_0, 0, lbl_valid));
>> +	dt_cg_probe_error(pcb, -1, DTRACEFLT_BADSTACK, 0 /* FIXME */);
> Perhaps clarify the FIXME?
>
>> +	emitl(dlp, lbl_valid,
>> +		   BPF_NOP());
>> +	dt_regset_free(drp, BPF_REG_0);
>>   }
>>   
>>   static void
>> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
>> index f39156df..483f9f28 100644
>> --- a/libdtrace/dt_consume.c
>> +++ b/libdtrace/dt_consume.c
>> @@ -2018,15 +2018,18 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
>>   		for (i = 0; i < pdat->dtpda_ddesc->dtdd_nrecs; i++) {
>>   			int			n;
>>   			dtrace_recdesc_t	*rec;
>> +			dtrace_actkind_t	act;
>>   			int (*func)(dtrace_hdl_t *, FILE *, void *,
>>   			    const dtrace_probedata_t *,
>>   			    const dtrace_recdesc_t *, uint_t,
>>   			    const void *buf, size_t);
>> +			caddr_t			addr;
> Perhaps a less generic name for this variables would be useful?  Something as
> bsic as rdata (record data) is better than just 'addr'.  Or anything else that
> is not just 'an address'.
>
>>   			rec = &pdat->dtpda_ddesc->dtdd_recs[i];
>> -			pdat->dtpda_data = data + rec->dtrd_offset;
>> +			act = rec->dtrd_action;
>> +			pdat->dtpda_data = addr = data + rec->dtrd_offset;
>>   
>> -			if (rec->dtrd_action == DTRACEACT_LIBACT) {
>> +			if (act == DTRACEACT_LIBACT) {
>>   				switch (rec->dtrd_arg) {
>>   				case DT_ACT_DENORMALIZE:
>>   					if (dt_normalize(dtp, data, rec) != 0)
>> @@ -2057,7 +2060,16 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
>>   			if (rval != DTRACE_CONSUME_THIS)
>>   				return dt_set_errno(dtp, EDT_BADRVAL);
>>   
>> -			switch (rec->dtrd_action) {
>> +			if (act == DTRACEACT_STACK) {
>> +				int depth = rec->dtrd_arg;
>> +
>> +				if (dt_print_stack(dtp, fp, NULL, addr, depth,
>> +				    rec->dtrd_size / depth) < 0)
>> +					return -1;
>> +				continue;
>> +			}
>> +
>> +			switch (act) {
> I truly do not understand why you put the DTRACEACT_STACK case outside the
> switch, when it could just as easily be part of the switch.

You brought this up with respect to another patch as well, so I figured 
I'd explain.  The DTv1 code had this sequence of "if (act==foo)" code 
blocks.  I was simply trying to preserve some of the structure of that 
code.  Is that a good idea?  Some people will think so some of the time, 
and others... well.  Anyhow, I posted a v2 of that other patch.  The 
switch statement ends up impacting other patches and I propagated the 
change through.

Now, the DTv1 code still had the PRINTF-LIKE actions separately. Sounds 
like you want everything in one big switch.  Fine by me. I'll do that.  
I just figured I'd explain why the code looked the way it did.

> Also note comment
> above about 'addr'.
>
>>   			case DTRACEACT_PRINTF:
>>   				func = dtrace_fprintf;
>>   				break;
>> @@ -2092,8 +2104,7 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
>>   				continue;
>>   			}
>>   
>> -			n = dt_print_trace(dtp, fp, rec, pdat->dtpda_data,
>> -					   quiet);
>> +			n = dt_print_trace(dtp, fp, rec, addr, quiet);
> See comment above on 'addr'.
>
>>   			if (n < 0)
>>   				return DTRACE_WORKSTATUS_ERROR;
>>   		}
>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>> index c2d32863..240e29cc 100644
>> --- a/libdtrace/dt_impl.h
>> +++ b/libdtrace/dt_impl.h
>> @@ -380,6 +380,7 @@ struct dtrace_hdl {
>>   	dt_list_t dt_lib_dep_sorted;	/* dependency sorted library list */
>>   	dt_global_pcap_t dt_pcap; /* global tshark/pcap state */
>>   	char *dt_freopen_filename; /* filename for freopen() action */
>> +	uint_t dt_maxframes;	/* maximum stack depth */
> See comment above about putting this in the options.
>
>>   };
>>   
>>   /*
>> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
>> index f2b86f7c..0323e68a 100644
>> --- a/libdtrace/dt_open.c
>> +++ b/libdtrace/dt_open.c
>> @@ -645,6 +645,7 @@ dt_vopen(int version, int flags, int *errp,
>>   	int i, err;
>>   	char modpath[PATH_MAX];
>>   	struct rlimit rl;
>> +	FILE *fd;
>>   
>>   	const dt_intrinsic_t *dinp;
>>   	const dt_typedef_t *dtyp;
>> @@ -756,6 +757,11 @@ dt_vopen(int version, int flags, int *errp,
>>   	dt_dof_init(dtp);
>>   	uname(&dtp->dt_uts);
>>   
>> +	fd = fopen("/proc/sys/kernel/perf_event_max_stack", "r");
>> +	assert(fd);
>> +	assert(fscanf(fd, "%u", &dtp->dt_maxframes) == 1);
>> +	fclose(fd);
>> +
> See comment about putting maxstackframes in options.
>
>>   	/*
>>   	 * The default module path is derived in part from the utsname release
>>   	 * string.  So too is the path component which is added to .d file
>> diff --git a/test/unittest/stack/tst.default.d b/test/unittest/stack/tst.default.d
>> deleted file mode 100644
>> index 2e50ec39..00000000
>> --- a/test/unittest/stack/tst.default.d
>> +++ /dev/null
>> @@ -1,21 +0,0 @@
>> -/*
>> - * Oracle Linux DTrace.
>> - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
>> - * Licensed under the Universal Permissive License v 1.0 as shown at
>> - *http://oss.oracle.com/licenses/upl.
>> - */
>> -/* @@xfail: dtv2 */
>> -
>> -/*
>> - * ASSERTION:
>> - *   Test the stack action with the default stack depth.
>> - *
>> - * SECTION: Output Formatting/printf()
>> - *
>> - */
>> -
>> -BEGIN
>> -{
>> -	stack();
>> -	exit(0);
>> -}
>> diff --git a/test/unittest/stack/tst.default.r b/test/unittest/stack/tst.default.r
>> deleted file mode 100644
>> index e3a4e4c3..00000000
>> --- a/test/unittest/stack/tst.default.r
>> +++ /dev/null
>> @@ -1 +0,0 @@
>> -              dtrace`dtrace_ioctl+{ptr}
>> diff --git a/test/unittest/stack/tst.default.r.p b/test/unittest/stack/tst.default.r.p
>> deleted file mode 100755
>> index 281c025f..00000000
>> --- a/test/unittest/stack/tst.default.r.p
>> +++ /dev/null
>> @@ -1,5 +0,0 @@
>> -#!/bin/sed -nf
>> -
>> -# Eliminate all lines other than dtrace`ioctl.
>> -
>> -/dtrace`dtrace_ioctl/p
>> diff --git a/test/unittest/stack/tst.stack3_fbt.aarch64.r b/test/unittest/stack/tst.stack3_fbt.aarch64.r
>> new file mode 100644
>> index 00000000..5c8bfaed
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack3_fbt.aarch64.r
>> @@ -0,0 +1,11 @@
>> +                   FUNCTION:NAME
>> +                          :BEGIN
>> +               __vfs_write:entry
>> +              vmlinux`__vfs_write
>> +              vmlinux`ksys_write+{ptr}
>> +              vmlinux`__arm64_sys_write+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack3_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> diff --git a/test/unittest/stack/tst.stack3_fbt.d b/test/unittest/stack/tst.stack3_fbt.d
>> new file mode 100644
>> index 00000000..1a2eaf58
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack3_fbt.d
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Oracle Linux DTrace.
>> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
>> + * Licensed under the Universal Permissive License v 1.0 as shown at
>> + *http://oss.oracle.com/licenses/upl.
>> + */
>> +
>> +/*
>> + * ASSERTION: Test the stack action with depth 3.
>> + *
>> + * SECTION: Output Formatting/printf()
>> + */
>> +
>> +#pragma D option destructive
>> +
>> +BEGIN
>> +{
>> +	system("echo write something > /dev/null");
>> +}
>> +
>> +fbt::__vfs_write:entry
>> +{
>> +	stack(3);
>> +	exit(0);
>> +}
>> diff --git a/test/unittest/stack/tst.stack3_fbt.x86_64.r b/test/unittest/stack/tst.stack3_fbt.x86_64.r
>> new file mode 100644
>> index 00000000..f24c8cba
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack3_fbt.x86_64.r
>> @@ -0,0 +1,11 @@
>> +                   FUNCTION:NAME
>> +                          :BEGIN
>> +               __vfs_write:entry
>> +              vmlinux`__vfs_write+{ptr}
>> +              vmlinux`ksys_write+{ptr}
>> +              vmlinux`__x64_sys_write+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack3_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> diff --git a/test/unittest/stack/tst.stack_fbt.aarch64.r b/test/unittest/stack/tst.stack_fbt.aarch64.r
>> new file mode 100644
>> index 00000000..3a2896c4
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack_fbt.aarch64.r
>> @@ -0,0 +1,14 @@
>> +                   FUNCTION:NAME
>> +                          :BEGIN
>> +               __vfs_write:entry
>> +              vmlinux`__vfs_write
>> +              vmlinux`ksys_write+{ptr}
>> +              vmlinux`__arm64_sys_write+{ptr}
>> +              vmlinux`el0_svc_common+{ptr}
>> +              vmlinux`el0_svc_handler+{ptr}
>> +              vmlinux`el0_svc+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> diff --git a/test/unittest/stack/tst.stack_fbt.d b/test/unittest/stack/tst.stack_fbt.d
>> new file mode 100644
>> index 00000000..27db2116
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack_fbt.d
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Oracle Linux DTrace.
>> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
>> + * Licensed under the Universal Permissive License v 1.0 as shown at
>> + *http://oss.oracle.com/licenses/upl.
>> + */
>> +
>> +/*
>> + * ASSERTION: Test the stack action with the default stack depth.
>> + *
>> + * SECTION: Output Formatting/printf()
>> + */
>> +
>> +#pragma D option destructive
>> +
>> +BEGIN
>> +{
>> +	system("echo write something > /dev/null");
>> +}
>> +
>> +fbt::__vfs_write:entry
>> +{
>> +	stack();
>> +	exit(0);
>> +}
>> diff --git a/test/unittest/stack/tst.stack_fbt.x86_64.r b/test/unittest/stack/tst.stack_fbt.x86_64.r
>> new file mode 100644
>> index 00000000..792ce80a
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack_fbt.x86_64.r
>> @@ -0,0 +1,13 @@
>> +                   FUNCTION:NAME
>> +                          :BEGIN
>> +               __vfs_write:entry
>> +              vmlinux`__vfs_write+{ptr}
>> +              vmlinux`ksys_write+{ptr}
>> +              vmlinux`__x64_sys_write+{ptr}
>> +              vmlinux`do_syscall_64+{ptr}
>> +              vmlinux`entry_SYSCALL_64+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> -- 
>> 2.18.4
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/dtrace-devel/attachments/20210609/999272f9/attachment-0001.html 


More information about the DTrace-devel mailing list