[DTrace-devel] [PATCH 3/4] Drop counters support
Kris Van Hees
kris.van.hees at oracle.com
Wed May 10 03:32:48 UTC 2023
On Tue, May 09, 2023 at 06:33:36PM -0400, Eugene Loh via DTrace-devel wrote:
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> with questions/comments.
>
> First, the subject line sounds like some support is being dropped. Like,
> "Drop support for counters." So how about changing to "Add support for drop
> counters." Or, "Add drop-counter support."
Ah, good point.
> On 5/9/23 03:19, Kris Van Hees via DTrace-devel wrote:
> > DTrace provides a mechanism to report dropped events. A drop occurs
> > when trace data cannot be recorded for a particular reason. This
> > patch contains the full implementation of 4 categories of drops,
> > primarily because they are all closely related:
> >
> > - principal buffer drops: reported when the producer failed to add
> > trace data to the principal buffer
> > - aggregation buffer drops: reported when the producer failed to
> > allocate an aggreggation and therefore failed to record data
> > - speculation drops: reported when something goes wrong with the
> > recording of speculative tracing data
> > + regular drops: reported when speculative data could not be
> > written to a speculation buffer
> > + busy drops: reported when a speculation could not be created
> > because all buffers are busy being committed or discarded
> > + unavailable drops: reported when no available speculation
> > buffers were found
> > - dynamic variable drops: reported when a dynamic variable (or
> > associative array element) could not be allocated
> >
> > Two mechanism for reporting drops are needed:
>
> s/mechanism/mechanisms/
Yup.
> > (1) Per-CPU reporting: used for principal and aggregation buffer
> > drops (stored in the cpuinfo structures)
> > (2) Global reporting: used for speculation and dynamic variable
> > drops (stored in the state BPF map)
> >
> > Detection of drops (and subsequent reporting to the user) is done
> > through frequent retrieval of status data. The handling of status
> > data (and the use of statusrate) is being re-introduced with this
> > patch.
> >
> > The drop count for speculations is a bit more complex than the other
> > ones because drops can occur both in the producer (when data cannot
> > be written to the trace output buffer) *and* in the consumer (when
> > data cannot be recorded in a speculation buffer. These separate
>
> Second parenthetical remark needs a closing parenthesis.
Yup.
> > counts are combined whenever status processing takes place to ensure
> > the correct drop count is presented to the user.
> >
> > Various tests have updated expected results because drops are now
> > being reported correctly.
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> >
> > diff --git a/bpf/get_dvar.c b/bpf/get_dvar.c
> > @@ -1,19 +1,38 @@
> > +/*
> > + * Register an aggregation drop.
> > + */
>
> "Aggregation drop" is copy/paste error?
Correct. Fixed.
> > +noinline void *dt_no_dvar(void)
> > +{
> > + uint32_t kind = DT_STATE_DYNVAR_DROPS;
> > + uint32_t *valp;
> > +
> > + valp = bpf_map_lookup_elem(&state, &kind);
> > + if (valp == 0)
> > + return 0;
> > +
> > + atomic_add32(valp, 1);
> > + return 0;
> > +}
> > +
> > /*
> > * Dynamic variables are identified using a unique 64-bit key. Three different
> > * categories of dynamic variables are supported in DTrace:
> > @@ -80,7 +99,7 @@ noinline void *dt_get_dvar(uint64_t key, uint64_t store, uint64_t nval,
> > * with the default value.
> > */
> > if (bpf_map_update_elem(&dvars, &key, dflt, BPF_ANY) < 0)
> > - return 0;
> > + return dt_no_dvar();
> > val = bpf_map_lookup_elem(&dvars, &key);
> > if (val != 0)
> > @@ -126,13 +145,13 @@ noinline void *dt_get_assoc(uint32_t id, const char *tuple, uint64_t store,
> > * actual value.
> > */
> > if (bpf_map_update_elem(&tuples, tuple, &dflt_val, BPF_ANY) < 0)
> > - return 0;
> > + return dt_no_dvar();
> > valp = bpf_map_lookup_elem(&tuples, tuple);
> > if (valp == 0)
> > return 0;
>
> Should this one also return dt_no_dvar()?
Hm, yes... good point.
> > *valp = (uint64_t)valp;
> > if (bpf_map_update_elem(&tuples, tuple, valp, BPF_ANY) < 0)
> > - return 0;
> > + return dt_no_dvar();
> > val = *valp;
> > } else {
> > diff --git a/bpf/speculation.c b/bpf/speculation.c
> > @@ -19,18 +19,36 @@
> > +/*
> > + * Register an aggregation drop.
> > + */
>
> "Aggregation drop" is a copy/paste typo.
>
> > +noinline uint32_t dt_no_spec(uint32_t kind)
> > +{
> > + uint32_t *valp;
> > +
> > + valp = bpf_map_lookup_elem(&state, &kind);
> > + if (valp == 0)
> > + return 0;
> > +
> > + atomic_add32(valp, 1);
> > + return 0;
> > +}
> > +
> > /*
> > * Assign a speculation ID.
> > */
> > noinline uint32_t dt_speculation(void)
> > {
> > - uint32_t id;
> > - dt_bpf_specs_t zero;
> > + uint32_t id, busy;
> > + dt_bpf_specs_t zero;
> > + dt_bpf_specs_t *spec;
> > __builtin_memset(&zero, 0, sizeof (dt_bpf_specs_t));
> > + busy = 0;
>
> Or, do this initialization in the declaration.
>
> > #if 1 /* Loops are broken in BPF right now */
> > #define SEARCH(n) \
> > do { \
> > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > @@ -604,26 +604,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
> > static int
> > gmap_create_cpuinfo(dtrace_hdl_t *dtp)
> > {
> > - int i, fd, rc;
> > + int i, rc;
> > uint32_t key = 0;
> > dtrace_conf_t *conf = &dtp->dt_conf;
> > size_t ncpus = conf->max_cpuid + 1;
> > dt_bpf_cpuinfo_t *data;
> > cpuinfo_t *ci;
> > - data = dt_zalloc(dtp, ncpus * sizeof(dt_bpf_cpuinfo_t));
> > + data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus,
> > + sizeof(dt_bpf_cpuinfo_t));
> > if (data == NULL)
> > return dt_set_errno(dtp, EDT_NOMEM);
> > for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++)
> > memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t));
> > - fd = create_gmap(dtp, "cpuinfo", BPF_MAP_TYPE_PERCPU_ARRAY,
> > - sizeof(uint32_t), sizeof(dt_bpf_cpuinfo_t), 1);
> > - if (fd == -1)
> > + dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo",
> > + BPF_MAP_TYPE_PERCPU_ARRAY,
> > + sizeof(uint32_t),
> > + sizeof(dt_bpf_cpuinfo_t), 1);
> > + if (dtp->dt_cpumap_fd == -1)
> > return -1;
> > - rc = dt_bpf_map_update(fd, &key, data);
> > + rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data);
> > dt_free(dtp, data);
> > if (rc == -1)
> > return dt_bpf_error(dtp,
>
> Should these changes to gmap_create_cpuinfo(), along with the introduction
> to dt_cpumap_fd, be put in the preceding patch?
They are not needed in the preceeding patch. They are only needed with this
patch. I guess it could be split out into its own patch but I figured I
might as well include it here along with a few other changes that are very
xpecific to drop counter support.
> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > @@ -867,6 +870,36 @@ dt_cg_epilogue(dt_pcb_t *pcb)
> > emit(dlp, BPF_MOV_IMM(BPF_REG_5, pcb->pcb_bufoff));
> > emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, 4));
> > emit(dlp, BPF_CALL_HELPER(BPF_FUNC_perf_event_output));
> > +
> > + /*
> > + * If writing the trace data to the output buffer failed,
> > + * increment the drop count for the principal buffer (no
> > + * speculate() in the clause) or for aggregations (speculate()
> > + * in the clause).
> > + */
>
> The comment is out-of-order with respect to the code, refers incorrectly to
> aggregations, and is longer than necessary. How about just stopping the
> comment at "drop count."
I don't think that the order of cases in the comment vs the conditional in the
code is that big of a deal (it is rather obvious), but yes, should mention
speculations rather than aggregtions.
> > + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl));
> > + if (cflags & DT_CLSFLAG_SPECULATE) {
> > + idp = dt_dlib_get_map(dtp, "state");
> > + assert(idp != NULL);
> > + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id);
> > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP));
> > + emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, DT_STATE_SPEC_DROPS));
> > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem));
> > + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl));
> > + emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1));
> > + emit(dlp, BPF_XADD_REG(BPF_W, BPF_REG_0, 0, BPF_REG_1));
> > + } else {
> > + idp = dt_dlib_get_map(dtp, "cpuinfo");
> > + assert(idp != NULL);
> > + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id);
> > + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP));
> > + emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0));
> > + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem));
> > + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl));
> > + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, offsetof(dt_bpf_cpuinfo_t, buf_drops)));
> > + emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1));
> > + emit(dlp, BPF_XADD_REG(BPF_DW, BPF_REG_0, 0, BPF_REG_1));
>
> Instead of BPF_ADD the offset to %r0, you can just specify the offset in the
> BPF_XADD instruction.
Yes, good point.
> > + }
> > }
> > /*
> > @@ -3157,7 +3191,9 @@ dt_cg_store_var(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
> > emit(dlp, BPF_MOV_REG(reg, BPF_REG_0));
> > dt_regset_free(drp, BPF_REG_0);
> > +#if 0
> > dt_cg_check_notnull(dlp, drp, reg);
> > +#endif
> > if (dnp->dn_flags & DT_NF_REF) {
> > size_t srcsz;
>
> Why is this #if 0 code still there?
Forgot to remove it.
> > diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> > @@ -2613,8 +2617,8 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
> > rfunc, flow, quiet, peekflags,
> > last, 0, arg);
> > } else if (hdr->type == PERF_RECORD_LOST) {
> > -#ifdef FIXME
> > - uint64_t lost;
> > +#if 0
> > + uint64_t drops;
> > /*
> > * struct {
>
> Why #if 0? What is this code waiting for?
It is to be removed. We cannot use the PERF_RECORD_LOST entry to report
drops because that does not differentiate between kinds.
> > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> > @@ -189,5 +266,25 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
> > DTRACE_WORKSTATUS_ERROR)
> > return DTRACE_WORKSTATUS_ERROR;
> > + /*
> > + * If we are stopped, we use dt_get_status() to get any potential
> > + * pending speculation drops because we want to ensure that old and new
> > + * counts for other drops are identical (lest they be reported more
> > + * than once).
> > + *
> > + * If we are not stopped, we use dt_add_local_status() to adjust the
> > + * current drop counters without retrieving producer counts (since they
> > + * might have changed and we do not want to report them yet).
> > + */
>
> Put comments in the same order as the code.
Sure.
> > + if (!dtp->dt_stopped) {
> > + dt_add_local_status(dtp);
> > + gen = dtp->dt_statusgen ^ 1;
> > + } else {
> > + dt_get_status(dtp);
> > + gen = dtp->dt_statusgen;
> > + }
> > +
> > + dt_handle_status(dtp, &dtp->dt_status[gen], &dtp->dt_status[gen ^ 1]);
> > +
> > return rval;
> > }
> > diff --git a/test/unittest/assocs/tst.store_zero_deletes.d b/test/unittest/assocs/tst.store_zero_deletes.d
> > @@ -12,8 +12,13 @@
> > * SECTION: Variables/Thread-Local Variables
> > */
> > +/*
> > + * We use a dynvarsizde that guarantees that we can only store 4 values without
> > + * causing a drop. Since we use an associative array indexed by an int and
> > + * storing an int value, each element will use 20 bytes.
> > + */
>
> typo: dynvarsizde
>
> Also, the int+int=20bytes math is not intuitive.
That is ok - not meant to be intuitive - it is meant as a reminder,
> Finally, this test is changing without change in xfail annotation? So is
> this change irrelevant to pass/fail status? Presumably what happened was
> this test started failing a few patches ago and now is being fixed. If so,
> this fix should be put in the earlier patch so that no patch in the series
> introduces a test regression.
>
> > +#pragma D option dynvarsize=80
> > #pragma D option quiet
> > -#pragma D option dynvarsize=15
> > BEGIN
> > {
Yes, it belongs in the dynvarsize patch. Thanks for pointing this out.
More information about the DTrace-devel
mailing list