[DTrace-devel] [PATCH 31/38] Fix dt_pebs_init() call
Eugene Loh
eugene.loh at oracle.com
Wed Aug 28 20:57:43 UTC 2024
On 8/26/24 12:20, Kris Van Hees wrote:
> On Mon, Aug 26, 2024 at 11:42:25AM -0400, Eugene Loh wrote:
>> On 8/26/24 10:30, Kris Van Hees wrote:
>>
>>> This patch goes against the intent of the code, as far as I can see. The
>>> point of performing the size check prior to increasing the buffer size was
>>> to enforce the limit on the size that was explicit set by the user. Then,
>>> if that size was sufficient, then DTrace is at liberty to increase that
>>> size to a whole multiple of the page size.
>>>
>>> But if I were to insist that a buffer can only be 24 bytes, and 24 bytes are
>>> not enough to store a single trace record, DTrace should report an error
>>> rather than increasing my bufsize setting on its own.
>>
>> The point is that DTrace *does* increase my bufsize setting on its own. Why
>> should DTrace complain about my setting when it's going to ignore my setting
>> anyhow? What value does the pre-existing behavior offer to DTrace users?
>> Are you saying that DTrace should not resize? That it should honor the
>> user's bufsize? No more resizing (to page size or whatever it wants)?
> The main point of this code was to retain the bevhaviour that DTrace provided,
As far as adhering to legacy behavior... quite simply, we do not. We do
not set the "principal buffer size" to bufsize. Further, any legacy
resizing behavior is, if I understand correctly, to go smaller than the
user requested; we do the opposite. Looking to the legacy behavior for
precedent is pointless. The situation is that we are unlike legacy
behavior in fundamental respects, opposite in another respect (automatic
resizing), but consistent in some impractical and useless respect. That
consistency strikes me as small consolation.
There is no documentation to change; the documentation is already
wrong. This patch was intended as a step to making the behavior
something that was describable. Clearly, trying to hold on to a legacy
buffering model has produced a byzantine situation in the current port,
which treats buffering very differently.
Perhaps we simply disagree on those points, and this patch is DOA. But
the patch points out other problems as well. E.g., the min computation
wasn't even right and error checking was wrong.
> which is that if a user explicitly requests a bufsize that is less than the
> minimum required for the given trace programs, an error is reported. That is
> a mater of being consistent (even if it has no practical application that I
> can see). Tests from the existing version failed because this was not being
> done right.
>
> As far as functionality, as long as the bufsize is large enough to hold the
> largest trace record, DTrace can (and needs to) ensure that the buffer size
> is a whole multiple of the page size because that is required by the perf
> ring buffer imlementation.
>
> So, DTrace only increases the buffer size to the required multiple of page size
> if the user supplied bufsize is at least large enough to hold the largest trace
> record (or if no bufsize was explicitly specified).
>
> We could opt to change the behaviour of the bufsize opton to always round up to
> a whole multiple of page size, but that is a deviation from the original
> implementation and thus is something that would need to be documented, etc...
>
> Aside from the computation of the minimum required size needing updating when
> patches (e.g. the USDT changes) modify the layout and size of the trace records,
> I do not think that the existing code is wrong.
>
> I am certainly open to a feature change to make bufsize always be rounded up as
> needed, but that is not a bug fix and I think that ought to be its own patch
> (along with documentation changes etc). And it might need some more discussion
> because it brings up the question why we wouldn't just simply always increase
> bufsize to be the smallest multiple of the page size to accomodate the largest
> trace record.
>
>>> On Thu, Jun 27, 2024 at 01:38:57AM -0400, eugene.loh at oracle.com wrote:
>>>> From: Eugene Loh <eugene.loh at oracle.com>
>>>>
>>>> The function had a few issues, mainly with its comments,
>>>> ranging from typos to incorrect descriptions of behavior.
>>>>
>>>> The function has only one caller. The enforcement of a
>>>> size minimum was problematic in a number of respects:
>>>> - it was missing 4 bytes
>>>> - it was enforcing the minimum before the size was
>>>> increased by the caller
>>>> - it was checking dt_pebs_init()==-1,
>>>> missing errors with other negative return values
>>>>
>>>> So change the function to accept a minimum and change its
>>>> return values.
>>>>
>>>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>>>> ---
>>>> libdtrace/dt_peb.c | 27 ++++++++++-----------
>>>> libdtrace/dt_peb.h | 2 +-
>>>> libdtrace/dt_work.c | 15 +++++++-----
>>>> test/unittest/options/err.b-too-low.d | 22 +++++++++--------
>>>> test/unittest/options/err.bufsize-too-low.d | 22 +++++++++--------
>>>> test/unittest/options/tst.b.d | 22 +++++++++--------
>>>> test/unittest/options/tst.bufsize.d | 22 +++++++++--------
>>>> 7 files changed, 71 insertions(+), 61 deletions(-)
>>>>
>>>> diff --git a/libdtrace/dt_peb.c b/libdtrace/dt_peb.c
>>>> index 5268f089..4983ba91 100644
>>>> --- a/libdtrace/dt_peb.c
>>>> +++ b/libdtrace/dt_peb.c
>>>> @@ -136,18 +136,14 @@ dt_pebs_exit(dtrace_hdl_t *dtp)
>>>> }
>>>> /*
>>>> - * Initialize the perf event buffers (one per online CPU). Each buffer will
>>>> - * the given number of pages (i.e. the total size of each buffer will be
>>>> - * num_pages * getpagesize()). The allocated memory for each buffer is mmap'd
>>>> - * so the kernel can write to it, and its representative file descriptor is
>>>> - * recorded in the 'buffers' BPF map so that BPF code knows where to write
>>>> - * trace data for a specific CPU.
>>>> + * Initialize the perf event buffers, one per online CPU. Each buffer will
>>>> + * have num_pages * getpagesize()). The dt_peb_open() call mmaps the allocated
>>>> + * memory so the kernel can write to it. The file descriptor is recorded in
>>>> + * the 'buffers' BPF map and added to the event polling file descriptor.
>>>> *
>>>> - * An event polling file descriptor is created as well, and it is configured to
>>>> - * monitor all perf event buffers at once. This file descriptor is returned
>>>> - * upon success.. Failure is indicated with a -1 return value.
>>>> + * The return value indicates success (0) or failure (-1).
>>>> */
>>>> -int dt_pebs_init(dtrace_hdl_t *dtp, size_t bufsize)
>>>> +int dt_pebs_init(dtrace_hdl_t *dtp, size_t bufsize, size_t bufsizemin)
>>>> {
>>>> int i;
>>>> int mapfd;
>>>> @@ -166,12 +162,15 @@ int dt_pebs_init(dtrace_hdl_t *dtp, size_t bufsize)
>>>> fprintf(stderr, "bufsize increased to %lu\n",
>>>> num_pages * getpagesize());
>>>> + if (num_pages * getpagesize() < bufsizemin)
>>>> + return dt_set_errno(dtp, EDT_BUFTOOSMALL);
>>>> +
>>>> /*
>>>> * Determine the fd for the 'buffers' BPF map.
>>>> */
>>>> idp = dt_dlib_get_map(dtp, "buffers");
>>>> if (idp == NULL || idp->di_id == DT_IDENT_UNDEF)
>>>> - return -ENOENT;
>>>> + return dt_set_errno(dtp, EDT_NOMEM); // FIXME we used to do something more akin to ENOENT
>>>> mapfd = idp->di_id;
>>>> @@ -180,7 +179,7 @@ int dt_pebs_init(dtrace_hdl_t *dtp, size_t bufsize)
>>>> */
>>>> dtp->dt_pebset = dt_zalloc(dtp, sizeof(dt_pebset_t));
>>>> if (dtp->dt_pebset == NULL)
>>>> - return -ENOMEM;
>>>> + return dt_set_errno(dtp, EDT_NOMEM);
>>>> /*
>>>> * Allocate the per-CPU perf event buffers.
>>>> @@ -189,7 +188,7 @@ int dt_pebs_init(dtrace_hdl_t *dtp, size_t bufsize)
>>>> sizeof(struct dt_peb));
>>>> if (pebs == NULL) {
>>>> dt_free(dtp, dtp->dt_pebset);
>>>> - return -ENOMEM;
>>>> + return dt_set_errno(dtp, EDT_NOMEM);
>>>> }
>>>> dtp->dt_pebset->pebs = pebs;
>>>> @@ -241,5 +240,5 @@ int dt_pebs_init(dtrace_hdl_t *dtp, size_t bufsize)
>>>> fail:
>>>> dt_pebs_exit(dtp);
>>>> - return -1;
>>>> + return dt_set_errno(dtp, EDT_NOMEM); // FIXME need something better
>>>> }
>>>> diff --git a/libdtrace/dt_peb.h b/libdtrace/dt_peb.h
>>>> index e0f408f2..9ad23252 100644
>>>> --- a/libdtrace/dt_peb.h
>>>> +++ b/libdtrace/dt_peb.h
>>>> @@ -43,7 +43,7 @@ typedef struct dt_pebset {
>>>> } dt_pebset_t;
>>>> extern void dt_pebs_exit(dtrace_hdl_t *);
>>>> -extern int dt_pebs_init(dtrace_hdl_t *, size_t);
>>>> +extern int dt_pebs_init(dtrace_hdl_t *, size_t, size_t);
>>>> #ifdef __cplusplus
>>>> }
>>>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
>>>> index 69a86358..7a0eb1da 100644
>>>> --- a/libdtrace/dt_work.c
>>>> +++ b/libdtrace/dt_work.c
>>>> @@ -267,18 +267,21 @@ dtrace_go(dtrace_hdl_t *dtp, uint_t cflags)
>>>> return dt_set_errno(dtp, errno);
>>>> /*
>>>> - * We need enough space for the pref_event_header, a 32-bit size, a
>>>> + * We need enough space for the perf_event_header, a 32-bit size, a
>>>> * 4-byte gap, and the largest trace data record we may be writing to
>>>> * the buffer. In other words, the buffer needs to be large enough to
>>>> * hold at least one perf-encapsulated trace data record.
>>>> + *
>>>> + * While dt_pebs_init() rounds the requested size up, size==0 is a
>>>> + * special case.
>>>> */
>>>> dtrace_getopt(dtp, "bufsize", &size);
>>>> - if (size == 0 ||
>>>> - size < sizeof(struct perf_event_header) + sizeof(uint32_t) +
>>>> - dtp->dt_maxreclen)
>>>> + if (size == 0)
>>>> return dt_set_errno(dtp, EDT_BUFTOOSMALL);
>>>> - if (dt_pebs_init(dtp, size) == -1)
>>>> - return dt_set_errno(dtp, EDT_NOMEM);
>>>> + if (dt_pebs_init(dtp, size,
>>>> + sizeof(struct perf_event_header) + sizeof(uint32_t)
>>>> + + 4 + dtp->dt_maxreclen) == -1)
>>>> + return -1;
>>>> /*
>>>> * We must initialize the aggregation consumer handling before we
>>>> diff --git a/test/unittest/options/err.b-too-low.d b/test/unittest/options/err.b-too-low.d
>>>> index bb77e37c..f62155dd 100644
>>>> --- a/test/unittest/options/err.b-too-low.d
>>>> +++ b/test/unittest/options/err.b-too-low.d
>>>> @@ -1,6 +1,6 @@
>>>> /*
>>>> * Oracle Linux DTrace.
>>>> - * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
>>>> + * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved.
>>>> * Licensed under the Universal Permissive License v 1.0 as shown at
>>>> * http://oss.oracle.com/licenses/upl.
>>>> */
>>>> @@ -12,19 +12,21 @@
>>>> */
>>>> /*
>>>> - * We use a buffer size of 59 because that should be just too small to hold the
>>>> - * trace records generated in this script:
>>>> - * - perf_event_header (40 bytes)
>>>> - * - size (4 bytes)
>>>> - * - gap (4 bytes)
>>>> - * - EPID (4 bytes)
>>>> - * - tag (4 bytes)
>>>> - * - exit value (4 bytes)
>>>> + * We need over 4k bytes for the 4 string trace records generated in this script
>>>> + * plus some meta data.
>>>> + * A bufsize of 1 to 4096 bytes is rounded up to 4096 and is insufficient.
>>>> + * A bufsize of 4097 to 8192 bytes is rounded up to 8192 and is sufficient.
>>>> */
>>>> -/* @@runtest-opts: -b59 */
>>>> +/* @@runtest-opts: -b4096 */
>>>> +
>>>> +#pragma D option strsize=1024
>>>> BEGIN
>>>> {
>>>> + trace("abc");
>>>> + trace("def");
>>>> + trace("ghi");
>>>> + trace("jkl");
>>>> exit(0);
>>>> }
>>>> diff --git a/test/unittest/options/err.bufsize-too-low.d b/test/unittest/options/err.bufsize-too-low.d
>>>> index bbbdb5c5..25efdf72 100644
>>>> --- a/test/unittest/options/err.bufsize-too-low.d
>>>> +++ b/test/unittest/options/err.bufsize-too-low.d
>>>> @@ -1,6 +1,6 @@
>>>> /*
>>>> * Oracle Linux DTrace.
>>>> - * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
>>>> + * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved.
>>>> * Licensed under the Universal Permissive License v 1.0 as shown at
>>>> * http://oss.oracle.com/licenses/upl.
>>>> */
>>>> @@ -12,19 +12,21 @@
>>>> */
>>>> /*
>>>> - * We use a buffer size of 59 because that should be just too small to hold the
>>>> - * trace records generated in this script:
>>>> - * - perf_event_header (40 bytes)
>>>> - * - size (4 bytes)
>>>> - * - gap (4 bytes)
>>>> - * - EPID (4 bytes)
>>>> - * - tag (4 bytes)
>>>> - * - exit value (4 bytes)
>>>> + * We need over 4k bytes for the 4 string trace records generated in this script
>>>> + * plus some meta data.
>>>> + * A bufsize of 1 to 4096 bytes is rounded up to 4096 and is insufficient.
>>>> + * A bufsize of 4097 to 8192 bytes is rounded up to 8192 and is sufficient.
>>>> */
>>>> -/* @@runtest-opts: -xbufsize=59 */
>>>> +/* @@runtest-opts: -xbufsize=4096 */
>>>> +
>>>> +#pragma D option strsize=1024
>>>> BEGIN
>>>> {
>>>> + trace("abc");
>>>> + trace("def");
>>>> + trace("ghi");
>>>> + trace("jkl");
>>>> exit(0);
>>>> }
>>>> diff --git a/test/unittest/options/tst.b.d b/test/unittest/options/tst.b.d
>>>> index 57fa030d..3bf08edc 100644
>>>> --- a/test/unittest/options/tst.b.d
>>>> +++ b/test/unittest/options/tst.b.d
>>>> @@ -1,6 +1,6 @@
>>>> /*
>>>> * Oracle Linux DTrace.
>>>> - * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
>>>> + * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved.
>>>> * Licensed under the Universal Permissive License v 1.0 as shown at
>>>> * http://oss.oracle.com/licenses/upl.
>>>> */
>>>> @@ -12,19 +12,21 @@
>>>> */
>>>> /*
>>>> - * We use a buffer size of 60 because that should be the exact size necessary
>>>> - * to hold the trace records generated in this script:
>>>> - * - perf_event_header (40 bytes)
>>>> - * - size (4 bytes)
>>>> - * - gap (4 bytes)
>>>> - * - EPID (4 bytes)
>>>> - * - tag (4 bytes)
>>>> - * - exit value (4 bytes)
>>>> + * We need over 4k bytes for the 4 string trace records generated in this script
>>>> + * plus some meta data.
>>>> + * A bufsize of 1 to 4096 bytes is rounded up to 4096 and is insufficient.
>>>> + * A bufsize of 4097 to 8192 bytes is rounded up to 8192 and is sufficient.
>>>> */
>>>> -/* @@runtest-opts: -b60 */
>>>> +/* @@runtest-opts: -b4097 */
>>>> +
>>>> +#pragma D option strsize=1024
>>>> BEGIN
>>>> {
>>>> + trace("abc");
>>>> + trace("def");
>>>> + trace("ghi");
>>>> + trace("jkl");
>>>> exit(0);
>>>> }
>>>> diff --git a/test/unittest/options/tst.bufsize.d b/test/unittest/options/tst.bufsize.d
>>>> index 96b0f1b8..23af81aa 100644
>>>> --- a/test/unittest/options/tst.bufsize.d
>>>> +++ b/test/unittest/options/tst.bufsize.d
>>>> @@ -1,6 +1,6 @@
>>>> /*
>>>> * Oracle Linux DTrace.
>>>> - * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
>>>> + * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved.
>>>> * Licensed under the Universal Permissive License v 1.0 as shown at
>>>> * http://oss.oracle.com/licenses/upl.
>>>> */
>>>> @@ -12,19 +12,21 @@
>>>> */
>>>> /*
>>>> - * We use a buffer size of 60 because that should be the exact size necessary
>>>> - * to hold the trace records generated in this script:
>>>> - * - perf_event_header (40 bytes)
>>>> - * - size (4 bytes)
>>>> - * - gap (4 bytes)
>>>> - * - EPID (4 bytes)
>>>> - * - tag (4 bytes)
>>>> - * - exit value (4 bytes)
>>>> + * We need over 4k bytes for the 4 string trace records generated in this script
>>>> + * plus some meta data.
>>>> + * A bufsize of 1 to 4096 bytes is rounded up to 4096 and is insufficient.
>>>> + * A bufsize of 4097 to 8192 bytes is rounded up to 8192 and is sufficient.
>>>> */
>>>> -/* @@runtest-opts: -xbufsize=60 */
>>>> +/* @@runtest-opts: -xbufsize=4097 */
>>>> +
>>>> +#pragma D option strsize=1024
>>>> BEGIN
>>>> {
>>>> + trace("abc");
>>>> + trace("def");
>>>> + trace("ghi");
>>>> + trace("jkl");
>>>> exit(0);
>>>> }
>>>> --
>>>> 2.18.4
>>>>
More information about the DTrace-devel
mailing list