[DTrace-devel] [PATCH 05/12] Add support for the stack() action
Eugene Loh
eugene.loh at oracle.com
Tue Jun 8 22:21:37 PDT 2021
On 6/9/21 12:43 AM, Kris Van Hees wrote:
> Fri, May 28, 2021 at 02:35:09PM -0400,eugene.loh at oracle.com wrote:
>> From: Eugene Loh<eugene.loh at oracle.com>
>>
>> Implement the stack() action using the bpf_get_stack() helper
>> function. This implementation most closely resembles the legacy
>> DTrace implementation. Someday it may make sense to switch over
>> to the bpf_get_stackid() instead, which would allow consolidation
>> of like stacks.
>>
>> The max stack size can be controlled by the kernel parameter
>> perf_event_max_stack, and we would like to allow users to increase
>> that limit. We will eventually need to know this limit in various
>> places. So, add a dtp->dt_maxframes value.
> It seems that the max frames setting should perhaps be an option, just
> like various other tunables?
I can imagine that, but that would be a new option and I would see it
being a separate patch. If you really want it in this patch, just let
me know.
> I could see a potential use for setting the
> maxframes value to be less than the system max (e.g. to avoid wasting space
> in buffers because I know I won't need the max anyway). So, the default
> would be the perf_event_max_stack value, but it would be something the user
> can override with a pragma or -x option. Then you an access the value just
> like any other option, and you do not need to add another member to the
> dtrace_hdl_t struct.
>
> Since there is a stackframes option, perhaps use maxstackframes for this?
>
>> Change the stack() testing. Specifically, replace the default
>> test since BEGIN no longer has a kernel stack.
>>
>> Signed-off-by: Eugene Loh<eugene.loh at oracle.com>
>> ---
>> libdtrace/dt_cg.c | 69 +++++++++++++++++---
>> libdtrace/dt_consume.c | 21 ++++--
>> libdtrace/dt_impl.h | 1 +
>> libdtrace/dt_open.c | 6 ++
>> test/unittest/stack/tst.default.d | 21 ------
>> test/unittest/stack/tst.default.r | 1 -
>> test/unittest/stack/tst.default.r.p | 5 --
>> test/unittest/stack/tst.stack3_fbt.aarch64.r | 11 ++++
>> test/unittest/stack/tst.stack3_fbt.d | 25 +++++++
>> test/unittest/stack/tst.stack3_fbt.x86_64.r | 11 ++++
>> test/unittest/stack/tst.stack_fbt.aarch64.r | 14 ++++
>> test/unittest/stack/tst.stack_fbt.d | 25 +++++++
>> test/unittest/stack/tst.stack_fbt.x86_64.r | 13 ++++
>> 13 files changed, 182 insertions(+), 41 deletions(-)
>> delete mode 100644 test/unittest/stack/tst.default.d
>> delete mode 100644 test/unittest/stack/tst.default.r
>> delete mode 100755 test/unittest/stack/tst.default.r.p
>> create mode 100644 test/unittest/stack/tst.stack3_fbt.aarch64.r
>> create mode 100644 test/unittest/stack/tst.stack3_fbt.d
>> create mode 100644 test/unittest/stack/tst.stack3_fbt.x86_64.r
>> create mode 100644 test/unittest/stack/tst.stack_fbt.aarch64.r
>> create mode 100644 test/unittest/stack/tst.stack_fbt.d
>> create mode 100644 test/unittest/stack/tst.stack_fbt.x86_64.r
>>
>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>> index f7168ffe..e7715aec 100644
>> --- a/libdtrace/dt_cg.c
>> +++ b/libdtrace/dt_cg.c
>> @@ -1333,21 +1333,72 @@ dt_cg_act_speculate(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind)
>> static void
>> dt_cg_act_stack(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind)
>> {
>> - dt_node_t *arg = dnp->dn_args;
>> -#ifdef FIXME
>> - uint32_t nframes = 0;
>> -#endif
>> + dtrace_hdl_t *dtp = pcb->pcb_hdl;
>> + dt_irlist_t *dlp = &pcb->pcb_ir;
>> + dt_regset_t *drp = pcb->pcb_regs;
>> + dt_node_t *arg = dnp->dn_args;
>> + int nframes = dtp->dt_options[DTRACEOPT_STACKFRAMES];
>> + int skip = 0;
>> + uint_t off;
>> + uint_t lbl_valid = dt_irlist_label(dlp);
>> +
>> + /*
>> + * Legacy default was dtrace_stackframes_default,
>> + * in kernel file dtrace/dtrace_state.c.
>> + */
>> + if (nframes == DTRACEOPT_UNSET)
>> + nframes = 20;
> Yuck - an arbitrary constant inline in code. This should be a define
> somewhere so you can use a symbolic name here..
Okay. Just seemed to make no sense to define a symbol that is used
exactly once. The rationale would be that it makes it clear what the
number is, but the code above does just that: it sets something to a
specific numerical value explaining that value. Splitting this over
different files would seem to obfuscate things, but let me know if you
want it that way.
>
>> if (arg != NULL) {
>> - if (!dt_node_is_posconst(arg)) {
>> + if (!dt_node_is_posconst(arg))
>> dnerror(arg, D_STACK_SIZE, "stack( ) argument #1 must "
>> "be a non-zero positive integer constant\n");
>> - }
>>
>> -#ifdef FIXME
>> - nframes = (uint32_t)arg->dn_value;
>> -#endif
>> + nframes = arg->dn_value;
>> }
>> +
>> + /*
>> + * FIXME:
>> + * What should happen if more frames are requested than allowed?
>> + * 1. Silently reduce nframes?
>> + * 2. Report message and reduce nframes?
>> + * 3. Report message and abort?
>> + * Option 1 is the easiest and what we do here.
>> + *
>> + * A message could explain that the limit can be raised:
>> + * # sysctl kernel.perf_event_max_stack=<new value>
>> + * See bpf_get_stack() info in /usr/include/linux/bpf.h.
>> + *
>> + * Note that DTv1 allows much larger stacks but does not handle
>> + * the "too large" case very well:
>> + * # dtrace -qn 'BEGIN {stack(1234567); exit(0)}'
>> + * dtrace: 1 drop on CPU 2
>> + * [...hang...]
>> + */
> No need for this comment block. Just describe that if the requested number
> of frames is higher than the max, the limit will be reduced to the max (or
> something like that). That is the behaviour bpf_get_stack() provides as well.
>
>> + if (nframes > dtp->dt_maxframes)
>> + nframes = dtp->dt_maxframes;
>> +
>> + /* Figure out where we want to be in the output buffer. */
> I think it would be more accurate to mention in the comment that this call
> reserves space in the output buffer since that is what it does.
>
>> + off = dt_rec_add(dtp, dt_cg_fill_gap, DTRACEACT_STACK,
>> + 8 * nframes, 8, NULL, nframes);
> I really prefer to see sizeof(uint64_t) instead of 8. Yes, we know it is 8,
> but using the sizeof() at least tells us what that value means.
>
>> +
>> + /* Now call bpf_get_stack(ctx, buf, size, flags). */
>> + if (dt_regset_xalloc_args(drp) == -1)
>> + longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
>> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_CTX));
>> + emit(dlp, BPF_MOV_REG(BPF_REG_2, BPF_REG_9));
>> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, off));
>> + emit(dlp, BPF_MOV_IMM(BPF_REG_3, 8 * nframes));
> See comment above about '8'.
>
>> + emit(dlp, BPF_MOV_IMM(BPF_REG_4, skip & BPF_F_SKIP_FIELD_MASK));
>> + dt_regset_xalloc(drp, BPF_REG_0);
>> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_stack));
>> + dt_regset_free_args(drp);
>> + emit(dlp, BPF_BRANCH_IMM(BPF_JSGE, BPF_REG_0, 0, lbl_valid));
>> + dt_cg_probe_error(pcb, -1, DTRACEFLT_BADSTACK, 0 /* FIXME */);
> Perhaps clarify the FIXME?
>
>> + emitl(dlp, lbl_valid,
>> + BPF_NOP());
>> + dt_regset_free(drp, BPF_REG_0);
>> }
>>
>> static void
>> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
>> index f39156df..483f9f28 100644
>> --- a/libdtrace/dt_consume.c
>> +++ b/libdtrace/dt_consume.c
>> @@ -2018,15 +2018,18 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
>> for (i = 0; i < pdat->dtpda_ddesc->dtdd_nrecs; i++) {
>> int n;
>> dtrace_recdesc_t *rec;
>> + dtrace_actkind_t act;
>> int (*func)(dtrace_hdl_t *, FILE *, void *,
>> const dtrace_probedata_t *,
>> const dtrace_recdesc_t *, uint_t,
>> const void *buf, size_t);
>> + caddr_t addr;
> Perhaps a less generic name for this variables would be useful? Something as
> bsic as rdata (record data) is better than just 'addr'. Or anything else that
> is not just 'an address'.
>
>> rec = &pdat->dtpda_ddesc->dtdd_recs[i];
>> - pdat->dtpda_data = data + rec->dtrd_offset;
>> + act = rec->dtrd_action;
>> + pdat->dtpda_data = addr = data + rec->dtrd_offset;
>>
>> - if (rec->dtrd_action == DTRACEACT_LIBACT) {
>> + if (act == DTRACEACT_LIBACT) {
>> switch (rec->dtrd_arg) {
>> case DT_ACT_DENORMALIZE:
>> if (dt_normalize(dtp, data, rec) != 0)
>> @@ -2057,7 +2060,16 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
>> if (rval != DTRACE_CONSUME_THIS)
>> return dt_set_errno(dtp, EDT_BADRVAL);
>>
>> - switch (rec->dtrd_action) {
>> + if (act == DTRACEACT_STACK) {
>> + int depth = rec->dtrd_arg;
>> +
>> + if (dt_print_stack(dtp, fp, NULL, addr, depth,
>> + rec->dtrd_size / depth) < 0)
>> + return -1;
>> + continue;
>> + }
>> +
>> + switch (act) {
> I truly do not understand why you put the DTRACEACT_STACK case outside the
> switch, when it could just as easily be part of the switch.
You brought this up with respect to another patch as well, so I figured
I'd explain. The DTv1 code had this sequence of "if (act==foo)" code
blocks. I was simply trying to preserve some of the structure of that
code. Is that a good idea? Some people will think so some of the time,
and others... well. Anyhow, I posted a v2 of that other patch. The
switch statement ends up impacting other patches and I propagated the
change through.
Now, the DTv1 code still had the PRINTF-LIKE actions separately. Sounds
like you want everything in one big switch. Fine by me. I'll do that.
I just figured I'd explain why the code looked the way it did.
> Also note comment
> above about 'addr'.
>
>> case DTRACEACT_PRINTF:
>> func = dtrace_fprintf;
>> break;
>> @@ -2092,8 +2104,7 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
>> continue;
>> }
>>
>> - n = dt_print_trace(dtp, fp, rec, pdat->dtpda_data,
>> - quiet);
>> + n = dt_print_trace(dtp, fp, rec, addr, quiet);
> See comment above on 'addr'.
>
>> if (n < 0)
>> return DTRACE_WORKSTATUS_ERROR;
>> }
>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>> index c2d32863..240e29cc 100644
>> --- a/libdtrace/dt_impl.h
>> +++ b/libdtrace/dt_impl.h
>> @@ -380,6 +380,7 @@ struct dtrace_hdl {
>> dt_list_t dt_lib_dep_sorted; /* dependency sorted library list */
>> dt_global_pcap_t dt_pcap; /* global tshark/pcap state */
>> char *dt_freopen_filename; /* filename for freopen() action */
>> + uint_t dt_maxframes; /* maximum stack depth */
> See comment above about putting this in the options.
>
>> };
>>
>> /*
>> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
>> index f2b86f7c..0323e68a 100644
>> --- a/libdtrace/dt_open.c
>> +++ b/libdtrace/dt_open.c
>> @@ -645,6 +645,7 @@ dt_vopen(int version, int flags, int *errp,
>> int i, err;
>> char modpath[PATH_MAX];
>> struct rlimit rl;
>> + FILE *fd;
>>
>> const dt_intrinsic_t *dinp;
>> const dt_typedef_t *dtyp;
>> @@ -756,6 +757,11 @@ dt_vopen(int version, int flags, int *errp,
>> dt_dof_init(dtp);
>> uname(&dtp->dt_uts);
>>
>> + fd = fopen("/proc/sys/kernel/perf_event_max_stack", "r");
>> + assert(fd);
>> + assert(fscanf(fd, "%u", &dtp->dt_maxframes) == 1);
>> + fclose(fd);
>> +
> See comment about putting maxstackframes in options.
>
>> /*
>> * The default module path is derived in part from the utsname release
>> * string. So too is the path component which is added to .d file
>> diff --git a/test/unittest/stack/tst.default.d b/test/unittest/stack/tst.default.d
>> deleted file mode 100644
>> index 2e50ec39..00000000
>> --- a/test/unittest/stack/tst.default.d
>> +++ /dev/null
>> @@ -1,21 +0,0 @@
>> -/*
>> - * Oracle Linux DTrace.
>> - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
>> - * Licensed under the Universal Permissive License v 1.0 as shown at
>> - *http://oss.oracle.com/licenses/upl.
>> - */
>> -/* @@xfail: dtv2 */
>> -
>> -/*
>> - * ASSERTION:
>> - * Test the stack action with the default stack depth.
>> - *
>> - * SECTION: Output Formatting/printf()
>> - *
>> - */
>> -
>> -BEGIN
>> -{
>> - stack();
>> - exit(0);
>> -}
>> diff --git a/test/unittest/stack/tst.default.r b/test/unittest/stack/tst.default.r
>> deleted file mode 100644
>> index e3a4e4c3..00000000
>> --- a/test/unittest/stack/tst.default.r
>> +++ /dev/null
>> @@ -1 +0,0 @@
>> - dtrace`dtrace_ioctl+{ptr}
>> diff --git a/test/unittest/stack/tst.default.r.p b/test/unittest/stack/tst.default.r.p
>> deleted file mode 100755
>> index 281c025f..00000000
>> --- a/test/unittest/stack/tst.default.r.p
>> +++ /dev/null
>> @@ -1,5 +0,0 @@
>> -#!/bin/sed -nf
>> -
>> -# Eliminate all lines other than dtrace`ioctl.
>> -
>> -/dtrace`dtrace_ioctl/p
>> diff --git a/test/unittest/stack/tst.stack3_fbt.aarch64.r b/test/unittest/stack/tst.stack3_fbt.aarch64.r
>> new file mode 100644
>> index 00000000..5c8bfaed
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack3_fbt.aarch64.r
>> @@ -0,0 +1,11 @@
>> + FUNCTION:NAME
>> + :BEGIN
>> + __vfs_write:entry
>> + vmlinux`__vfs_write
>> + vmlinux`ksys_write+{ptr}
>> + vmlinux`__arm64_sys_write+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack3_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> diff --git a/test/unittest/stack/tst.stack3_fbt.d b/test/unittest/stack/tst.stack3_fbt.d
>> new file mode 100644
>> index 00000000..1a2eaf58
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack3_fbt.d
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Oracle Linux DTrace.
>> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
>> + * Licensed under the Universal Permissive License v 1.0 as shown at
>> + *http://oss.oracle.com/licenses/upl.
>> + */
>> +
>> +/*
>> + * ASSERTION: Test the stack action with depth 3.
>> + *
>> + * SECTION: Output Formatting/printf()
>> + */
>> +
>> +#pragma D option destructive
>> +
>> +BEGIN
>> +{
>> + system("echo write something > /dev/null");
>> +}
>> +
>> +fbt::__vfs_write:entry
>> +{
>> + stack(3);
>> + exit(0);
>> +}
>> diff --git a/test/unittest/stack/tst.stack3_fbt.x86_64.r b/test/unittest/stack/tst.stack3_fbt.x86_64.r
>> new file mode 100644
>> index 00000000..f24c8cba
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack3_fbt.x86_64.r
>> @@ -0,0 +1,11 @@
>> + FUNCTION:NAME
>> + :BEGIN
>> + __vfs_write:entry
>> + vmlinux`__vfs_write+{ptr}
>> + vmlinux`ksys_write+{ptr}
>> + vmlinux`__x64_sys_write+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack3_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> diff --git a/test/unittest/stack/tst.stack_fbt.aarch64.r b/test/unittest/stack/tst.stack_fbt.aarch64.r
>> new file mode 100644
>> index 00000000..3a2896c4
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack_fbt.aarch64.r
>> @@ -0,0 +1,14 @@
>> + FUNCTION:NAME
>> + :BEGIN
>> + __vfs_write:entry
>> + vmlinux`__vfs_write
>> + vmlinux`ksys_write+{ptr}
>> + vmlinux`__arm64_sys_write+{ptr}
>> + vmlinux`el0_svc_common+{ptr}
>> + vmlinux`el0_svc_handler+{ptr}
>> + vmlinux`el0_svc+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> diff --git a/test/unittest/stack/tst.stack_fbt.d b/test/unittest/stack/tst.stack_fbt.d
>> new file mode 100644
>> index 00000000..27db2116
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack_fbt.d
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Oracle Linux DTrace.
>> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
>> + * Licensed under the Universal Permissive License v 1.0 as shown at
>> + *http://oss.oracle.com/licenses/upl.
>> + */
>> +
>> +/*
>> + * ASSERTION: Test the stack action with the default stack depth.
>> + *
>> + * SECTION: Output Formatting/printf()
>> + */
>> +
>> +#pragma D option destructive
>> +
>> +BEGIN
>> +{
>> + system("echo write something > /dev/null");
>> +}
>> +
>> +fbt::__vfs_write:entry
>> +{
>> + stack();
>> + exit(0);
>> +}
>> diff --git a/test/unittest/stack/tst.stack_fbt.x86_64.r b/test/unittest/stack/tst.stack_fbt.x86_64.r
>> new file mode 100644
>> index 00000000..792ce80a
>> --- /dev/null
>> +++ b/test/unittest/stack/tst.stack_fbt.x86_64.r
>> @@ -0,0 +1,13 @@
>> + FUNCTION:NAME
>> + :BEGIN
>> + __vfs_write:entry
>> + vmlinux`__vfs_write+{ptr}
>> + vmlinux`ksys_write+{ptr}
>> + vmlinux`__x64_sys_write+{ptr}
>> + vmlinux`do_syscall_64+{ptr}
>> + vmlinux`entry_SYSCALL_64+{ptr}
>> +
>> +
>> +-- @@stderr --
>> +dtrace: script 'test/unittest/stack/tst.stack_fbt.d' matched 2 probes
>> +dtrace: allowing destructive actions
>> --
>> 2.18.4
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/dtrace-devel/attachments/20210609/999272f9/attachment-0001.html
More information about the DTrace-devel
mailing list