[DTrace-devel] [PATCH 3/4] Drop counters support
Eugene Loh
eugene.loh at oracle.com
Tue May 9 22:33:36 UTC 2023
Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
with questions/comments.
First, the subject line sounds like some support is being dropped.
Like, "Drop support for counters." So how about changing to "Add
support for drop counters." Or, "Add drop-counter support."
On 5/9/23 03:19, Kris Van Hees via DTrace-devel wrote:
> DTrace provides a mechanism to report dropped events. A drop occurs
> when trace data cannot be recorded for a particular reason. This
> patch contains the full implementation of 4 categories of drops,
> primarily because they are all closely related:
>
> - principal buffer drops: reported when the producer failed to add
> trace data to the principal buffer
> - aggregation buffer drops: reported when the producer failed to
> allocate an aggreggation and therefore failed to record data
> - speculation drops: reported when something goes wrong with the
> recording of speculative tracing data
> + regular drops: reported when speculative data could not be
> written to a speculation buffer
> + busy drops: reported when a speculation could not be created
> because all buffers are busy being committed or discarded
> + unavailable drops: reported when no available speculation
> buffers were found
> - dynamic variable drops: reported when a dynamic variable (or
> associative array element) could not be allocated
>
> Two mechanism for reporting drops are needed:
s/mechanism/mechanisms/
> (1) Per-CPU reporting: used for principal and aggregation buffer
> drops (stored in the cpuinfo structures)
> (2) Global reporting: used for speculation and dynamic variable
> drops (stored in the state BPF map)
>
> Detection of drops (and subsequent reporting to the user) is done
> through frequent retrieval of status data. The handling of status
> data (and the use of statusrate) is being re-introduced with this
> patch.
>
> The drop count for speculations is a bit more complex than the other
> ones because drops can occur both in the producer (when data cannot
> be written to the trace output buffer) *and* in the consumer (when
> data cannot be recorded in a speculation buffer. These separate
Second parenthetical remark needs a closing parenthesis.
> counts are combined whenever status processing takes place to ensure
> the correct drop count is presented to the user.
>
> Various tests have updated expected results because drops are now
> being reported correctly.
>
> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>
> diff --git a/bpf/get_dvar.c b/bpf/get_dvar.c
> @@ -1,19 +1,38 @@
> +/*
> + * Register an aggregation drop.
> + */
"Aggregation drop" is copy/paste error?
> +noinline void *dt_no_dvar(void)
> +{
> + uint32_t kind = DT_STATE_DYNVAR_DROPS;
> + uint32_t *valp;
> +
> + valp = bpf_map_lookup_elem(&state, &kind);
> + if (valp == 0)
> + return 0;
> +
> + atomic_add32(valp, 1);
> + return 0;
> +}
> +
> /*
> * Dynamic variables are identified using a unique 64-bit key. Three different
> * categories of dynamic variables are supported in DTrace:
> @@ -80,7 +99,7 @@ noinline void *dt_get_dvar(uint64_t key, uint64_t store, uint64_t nval,
> * with the default value.
> */
> if (bpf_map_update_elem(&dvars, &key, dflt, BPF_ANY) < 0)
> - return 0;
> + return dt_no_dvar();
>
> val = bpf_map_lookup_elem(&dvars, &key);
> if (val != 0)
> @@ -126,13 +145,13 @@ noinline void *dt_get_assoc(uint32_t id, const char *tuple, uint64_t store,
> * actual value.
> */
> if (bpf_map_update_elem(&tuples, tuple, &dflt_val, BPF_ANY) < 0)
> - return 0;
> + return dt_no_dvar();
> valp = bpf_map_lookup_elem(&tuples, tuple);
> if (valp == 0)
> return 0;
Should this one also return dt_no_dvar()?
> *valp = (uint64_t)valp;
> if (bpf_map_update_elem(&tuples, tuple, valp, BPF_ANY) < 0)
> - return 0;
> + return dt_no_dvar();
>
> val = *valp;
> } else {
> diff --git a/bpf/speculation.c b/bpf/speculation.c
> @@ -19,18 +19,36 @@
> +/*
> + * Register an aggregation drop.
> + */
"Aggregation drop" is a copy/paste typo.
> +noinline uint32_t dt_no_spec(uint32_t kind)
> +{
> + uint32_t *valp;
> +
> + valp = bpf_map_lookup_elem(&state, &kind);
> + if (valp == 0)
> + return 0;
> +
> + atomic_add32(valp, 1);
> + return 0;
> +}
> +
> /*
> * Assign a speculation ID.
> */
> noinline uint32_t dt_speculation(void)
> {
> - uint32_t id;
> - dt_bpf_specs_t zero;
> + uint32_t id, busy;
> + dt_bpf_specs_t zero;
> + dt_bpf_specs_t *spec;
>
> __builtin_memset(&zero, 0, sizeof (dt_bpf_specs_t));
>
> + busy = 0;
Or, do this initialization in the declaration.
> #if 1 /* Loops are broken in BPF right now */
> #define SEARCH(n) \
> do { \
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> @@ -604,26 +604,29 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
> static int
> gmap_create_cpuinfo(dtrace_hdl_t *dtp)
> {
> - int i, fd, rc;
> + int i, rc;
> uint32_t key = 0;
> dtrace_conf_t *conf = &dtp->dt_conf;
> size_t ncpus = conf->max_cpuid + 1;
> dt_bpf_cpuinfo_t *data;
> cpuinfo_t *ci;
>
> - data = dt_zalloc(dtp, ncpus * sizeof(dt_bpf_cpuinfo_t));
> + data = dt_calloc(dtp, dtp->dt_conf.num_possible_cpus,
> + sizeof(dt_bpf_cpuinfo_t));
> if (data == NULL)
> return dt_set_errno(dtp, EDT_NOMEM);
>
> for (i = 0, ci = &conf->cpus[0]; i < ncpus; i++, ci++)
> memcpy(&data[ci->cpu_id].ci, ci, sizeof(cpuinfo_t));
>
> - fd = create_gmap(dtp, "cpuinfo", BPF_MAP_TYPE_PERCPU_ARRAY,
> - sizeof(uint32_t), sizeof(dt_bpf_cpuinfo_t), 1);
> - if (fd == -1)
> + dtp->dt_cpumap_fd = create_gmap(dtp, "cpuinfo",
> + BPF_MAP_TYPE_PERCPU_ARRAY,
> + sizeof(uint32_t),
> + sizeof(dt_bpf_cpuinfo_t), 1);
> + if (dtp->dt_cpumap_fd == -1)
> return -1;
>
> - rc = dt_bpf_map_update(fd, &key, data);
> + rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data);
> dt_free(dtp, data);
> if (rc == -1)
> return dt_bpf_error(dtp,
Should these changes to gmap_create_cpuinfo(), along with the
introduction to dt_cpumap_fd, be put in the preceding patch?
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> @@ -867,6 +870,36 @@ dt_cg_epilogue(dt_pcb_t *pcb)
> emit(dlp, BPF_MOV_IMM(BPF_REG_5, pcb->pcb_bufoff));
> emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, 4));
> emit(dlp, BPF_CALL_HELPER(BPF_FUNC_perf_event_output));
> +
> + /*
> + * If writing the trace data to the output buffer failed,
> + * increment the drop count for the principal buffer (no
> + * speculate() in the clause) or for aggregations (speculate()
> + * in the clause).
> + */
The comment is out-of-order with respect to the code, refers incorrectly
to aggregations, and is longer than necessary. How about just stopping
the comment at "drop count."
> + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl));
> + if (cflags & DT_CLSFLAG_SPECULATE) {
> + idp = dt_dlib_get_map(dtp, "state");
> + assert(idp != NULL);
> + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id);
> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP));
> + emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, DT_STATE_SPEC_DROPS));
> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem));
> + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl));
> + emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1));
> + emit(dlp, BPF_XADD_REG(BPF_W, BPF_REG_0, 0, BPF_REG_1));
> + } else {
> + idp = dt_dlib_get_map(dtp, "cpuinfo");
> + assert(idp != NULL);
> + dt_cg_xsetx(dlp, idp, DT_LBL_NONE, BPF_REG_1, idp->di_id);
> + emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_2, BPF_REG_FP, DT_STK_SP));
> + emit(dlp, BPF_STORE_IMM(BPF_DW, BPF_REG_2, 0, 0));
> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_map_lookup_elem));
> + emit(dlp, BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, pcb->pcb_exitlbl));
> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, offsetof(dt_bpf_cpuinfo_t, buf_drops)));
> + emit(dlp, BPF_MOV_IMM(BPF_REG_1, 1));
> + emit(dlp, BPF_XADD_REG(BPF_DW, BPF_REG_0, 0, BPF_REG_1));
Instead of BPF_ADD the offset to %r0, you can just specify the offset in
the BPF_XADD instruction.
> + }
> }
>
> /*
> @@ -3157,7 +3191,9 @@ dt_cg_store_var(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
> emit(dlp, BPF_MOV_REG(reg, BPF_REG_0));
> dt_regset_free(drp, BPF_REG_0);
>
> +#if 0
> dt_cg_check_notnull(dlp, drp, reg);
> +#endif
>
> if (dnp->dn_flags & DT_NF_REF) {
> size_t srcsz;
Why is this #if 0 code still there?
> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> @@ -2613,8 +2617,8 @@ dt_consume_one(dtrace_hdl_t *dtp, FILE *fp, char *buf,
> rfunc, flow, quiet, peekflags,
> last, 0, arg);
> } else if (hdr->type == PERF_RECORD_LOST) {
> -#ifdef FIXME
> - uint64_t lost;
> +#if 0
> + uint64_t drops;
>
> /*
> * struct {
Why #if 0? What is this code waiting for?
> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> @@ -189,5 +266,25 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
> DTRACE_WORKSTATUS_ERROR)
> return DTRACE_WORKSTATUS_ERROR;
>
> + /*
> + * If we are stopped, we use dt_get_status() to get any potential
> + * pending speculation drops because we want to ensure that old and new
> + * counts for other drops are identical (lest they be reported more
> + * than once).
> + *
> + * If we are not stopped, we use dt_add_local_status() to adjust the
> + * current drop counters without retrieving producer counts (since they
> + * might have changed and we do not want to report them yet).
> + */
Put comments in the same order as the code.
> + if (!dtp->dt_stopped) {
> + dt_add_local_status(dtp);
> + gen = dtp->dt_statusgen ^ 1;
> + } else {
> + dt_get_status(dtp);
> + gen = dtp->dt_statusgen;
> + }
> +
> + dt_handle_status(dtp, &dtp->dt_status[gen], &dtp->dt_status[gen ^ 1]);
> +
> return rval;
> }
> diff --git a/test/unittest/assocs/tst.store_zero_deletes.d b/test/unittest/assocs/tst.store_zero_deletes.d
> @@ -12,8 +12,13 @@
> * SECTION: Variables/Thread-Local Variables
> */
>
> +/*
> + * We use a dynvarsizde that guarantees that we can only store 4 values without
> + * causing a drop. Since we use an associative array indexed by an int and
> + * storing an int value, each element will use 20 bytes.
> + */
typo: dynvarsizde
Also, the int+int=20bytes math is not intuitive.
Finally, this test is changing without change in xfail annotation? So is
this change irrelevant to pass/fail status? Presumably what happened
was this test started failing a few patches ago and now is being fixed.
If so, this fix should be put in the earlier patch so that no patch in
the series introduces a test regression.
> +#pragma D option dynvarsize=80
> #pragma D option quiet
> -#pragma D option dynvarsize=15
>
> BEGIN
> {
More information about the DTrace-devel
mailing list