[DTrace-devel] [PATCH v3] Add rodata support
Eugene Loh
eugene.loh at oracle.com
Thu Aug 17 22:10:25 UTC 2023
On 8/17/23 10:53, Kris Van Hees via DTrace-devel wrote:
> This patch provides for read-only data to be placed in its own memory
> block (much like the .rodata ELF section). It lays the groundwork for
> future enhancements that support .rodata sections in precompiled BPF
> code (as may be generated for static data items), and for the storage
> of read-only data that is not constrainted by the maximum string size.
constrained
> It's first user will be the inet_ntoa6() implementation that needs to
Its
> store a read-only lookup table.
>
> The string table can be considered a special case of read-only data
> (as evidenced by the clear duplication of code between dt_rodata.c and
> dt_strtab.c) and a future patch will convert the strtab implementation
> to be a wrapper for a rodata-style memory block.
I'd drop the parenthetical remark.
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> @@ -695,10 +695,15 @@ gmap_create_scratchmem(dtrace_hdl_t *dtp)
> * String table map. This is a global map with a singleton element (key 0)
Here and lots of places: it isn't really the "string table map" any more.
> * that contains the entire string table as a concatenation of all unique
> * strings (each terminated with a NUL byte). The string table size is taken
> - * from the DTrace consumer handle (dt_strlen). Extra memory is allocated as a
> - * memory block of zeros for initializing memory regions. Its size is at least
> - * the maximum string size to ensure the BPF verifier can validate all access
> - * requests for dynamic references to string constants.
> + * from the DTrace consumer handle (dt_strlen). The read-only data is appended
> + * to the string table, as is a memory block of zeros for initializing memory
> + * regions. Its size is at least
Its size? Whose size? And.... is at least what?
> + *
> + * In order to ensure the BPF verifier can validate all access requests for
> + * dybamic references to string constants, the size of the read-only data plus
> + * the size of the block of zeros must at least match the maximum string size.
> + * The size of the block of zeros must at least match the maximum read-only
> + * item size.
Okay but "the size of the block of zeros" could perhaps be dt_zerosize?
Anyhow, "dynamic."
> */
> static int
> gmap_create_strtab(dtrace_hdl_t *dtp)
> @@ -711,13 +716,20 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> int fd, rc, err;
>
> dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
> - dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> + dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> + dtp->dt_rosize = dt_rodata_size(dtp->dt_rodata);
> + dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_rooffset + dtp->dt_rosize, 8);
> + dtp->dt_zerosize = 0;
> + sz = dt_rodata_max_item_size(dtp->dt_rodata);
To me, it makes much more sense to put those last two lines after the
next comment.
> /*
> * Ensure the zero-filled memory at the end of the strtab is large
> * enough to accomodate all needs for such a memory block.
> */
accommodate
> - dtp->dt_zerosize = strsize + 1;
> + if (dtp->dt_rosize < strsize + 1)
> + dtp->dt_zerosize = strsize + 1 - dtp->dt_rosize;
> + if (dtp->dt_zerosize < sz)
> + dtp->dt_zerosize = sz;
> if (dtp->dt_zerosize < dtp->dt_maxdvarsize)
> dtp->dt_zerosize = dtp->dt_maxdvarsize;
> if (dtp->dt_zerosize < dtp->dt_maxtuplesize)
It'd probably be a bit clearer if MAX() were used.
> @@ -730,6 +742,7 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> if (strtab == NULL)
> return dt_set_errno(dtp, EDT_NOMEM);
>
> + /* Copy the string table data. */
> dt_strtab_write(dtp->dt_ccstab, (dt_strtab_write_f *)dt_strtab_copystr,
> strtab);
> @@ -745,6 +758,10 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> buf += len + 1;
> }
>
> + /* Copy the read-only data. */
> + dt_rodata_write(dtp->dt_rodata, (dt_rodata_copy_f *)dt_rodata_copy,
> + strtab + dtp->dt_rooffset);
Should we now have a dt_rodata_destroy()?
I'd make that second argument NULL. (See my comment later about
dt_rodata_write().)
Should we check the return value of the function?
> fd = create_gmap(dtp, "strtab", BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
> sz, 1);
> if (fd == -1)
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> @@ -56,6 +56,7 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act)
> dt_ident_t *mem = dt_dlib_get_map(dtp, "mem");
> dt_ident_t *state = dt_dlib_get_map(dtp, "state");
> dt_ident_t *prid = dt_dlib_get_var(pcb->pcb_hdl, "PRID");
> + dt_ident_t *ro_off = dt_dlib_get_var(dtp, "RODATA_OFF");
> uint_t lbl_exit = pcb->pcb_exitlbl;
>
Should there be an assert(ro_off), at least to be consistent with
existing practice?
> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> @@ -888,6 +888,7 @@ dt_vopen(int version, int flags, int *errp,
> DIF_VAR_OTHER_UBASE, DIF_VAR_OTHER_MAX);
>
> dtp->dt_ccstab = dt_strtab_create(BUFSIZ);
> + dtp->dt_rodata = dt_rodata_create(BUFSIZ);
> dtp->dt_strlen = 0;
>
Should we check if dt_rodata is NULL? And where is it destroyed? Right
after the dt_rodata_write() call? Or in dt_close()?
> diff --git a/libdtrace/dt_rodata.c b/libdtrace/dt_rodata.c
> @@ -0,0 +1,307 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2006, 2023, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <sys/types.h>
FWIW, sys/types.h might be brought in by dt_rodata.h.
> +#include <string.h>
> +#include <stdlib.h>
> +#include <assert.h>
> +
> +#include <dt_rodata.h>
> +#include <dt_string.h>
> +#include <dt_impl.h>
> +
I'd like a block comment like this near the top of this file:
The function dt_rodata_insert(dp, ptr, len) takes an input block of
data,
described by ptr and len, and appends it to a monolithic buffer
described
by dp. The function returns the offset in the buffer where the data is
stored.
Actually, dp also has a hash table so that if the data block
already appears
in the buffer, an offset to the pre-existing copy is returned and
no new
copy is added.
Further, dp actually stores the buffer in a growable set of smaller
chunks,
each chunk of size originally specified in dt_rodata_create(bufsz).
The buffer can be written out to a single, linear block of memory using
dt_rodata_write().
> +typedef struct dt_rohash dt_rohash_t;
> +struct dt_rohash {
> + const char *data; /* pointer to actual data */
> + size_t buf; /* index of data buffer */
> + size_t off; /* offset in bytes of this item */
> + size_t len; /* length in bytes of this item */
> + dt_rohash_t *next; /* next item in hash chain */
> +};
"of this item" might be unnecessary (both instances)
> +
> +struct dt_rodata {
> + dt_rohash_t **hash; /* array of hash buckets */
Name it "hash bucket array" to make the correspondence to the next line
clearer.
> + size_t hashsz; /* size of hash bucket array */
This is set to 211? Is that actually a reasonable size for this? (Not a
big deal, I suppose.)
> + char **bufs; /* array of buffer pointers */
> + char *ptr; /* pointer to current buffer location */
> + size_t nbufs; /* size of buffer pointer array */
> + size_t bufsz; /* size of individual buffer */
> + size_t count; /* total number of items in rodata */
>
So far as I can tell, no one uses "count".
> + size_t size; /* total size of items in bytes */
> + size_t item_size; /* maximum item size */
Rename to "max_item_size".
> +dt_rodata_t *
> +dt_rodata_create(size_t bufsz)
> +{
> + dt_rodata_t *dp = malloc(sizeof(dt_rodata_t));
> + size_t nbuckets = _dtrace_strbuckets;
> +
> + assert(bufsz != 0);
> +
> + if (dp == NULL)
> + return NULL;
> +
> + memset(dp, 0, sizeof(dt_rodata_t));
> + dp->hash = malloc(nbuckets * sizeof(dt_rohash_t *));
> + if (dp->hash == NULL)
> + goto err;
> +
> + memset(dp->hash, 0, nbuckets * sizeof(dt_rohash_t *));
> + dp->hashsz = nbuckets;
> + dp->bufs = NULL;
> + dp->ptr = NULL;
> + dp->nbufs = 0;
> + dp->bufsz = bufsz;
> +
> + if (dt_rodata_grow(dp) == -1)
> + goto err;
Why not let the first dt_rodata_grow() be made by the first "insert"?
> +
> + /* Pre-populate the read-only data with a 0-byte at offset 0. */
> + *dp->ptr++ = 0;
> + dp->size = 1;
> + dp->count = 1;
Why prepopulate? And why not with an "insert" call?
> +
> + return dp;
> +
> +err:
> + dt_rodata_destroy(dp);
> + return NULL;
> +}
> +
> +static int
> +dt_rodata_compare(dt_rodata_t *dp, dt_rohash_t *hp, const char *ptr, size_t len)
> +{
> + size_t b = hp->buf;
> + const char *buf = hp->data;
> + size_t resid, n;
> + int rv;
resid, n, and rv can all be declared inside the while() loop.
> +
> + while (len != 0) {
> + if (buf == dp->bufs[b] + dp->bufsz)
> + buf = dp->bufs[++b];
> +
> + resid = dp->bufs[b] + dp->bufsz - buf;
> + n = MIN(resid, len);
> +
> + if ((rv = memcmp(buf, ptr, n)) != 0)
> + return rv;
> +
> + buf += n;
> + ptr += n;
> + len -= n;
> + }
> +
> + return 0;
> +}
> +
> +static int
> +dt_rodata_copyin(dt_rodata_t *dp, const char *ptr, size_t len)
> +{
> + char *old_p = dp->ptr;
> + size_t old_n = dp->nbufs;
> + size_t b = dp->nbufs - 1;
> + size_t resid, n;
resid and n should both be declared inside the while() loop.
> +
> + while (len != 0) {
> + if (dp->ptr == dp->bufs[b] + dp->bufsz) {
> + if (dt_rodata_grow(dp) == -1)
> + goto err;
> + b++;
> + }
> +
> + resid = dp->bufs[b] + dp->bufsz - dp->ptr;
> + n = MIN(resid, len);
> + memcpy(dp->ptr, ptr, n);
> +
> + dp->ptr += n;
> + ptr += n;
> + len -= n;
> + }
> +
> + return 0;
> +
> +err:
> + while (dp->nbufs != old_n)
> + free(dp->bufs[--dp->nbufs]);
> +
> + dp->ptr = old_p;
> + return -1;
> +}
> +
> +static ssize_t
> +dt_rodata_xindex(dt_rodata_t *dp, const char *ptr, size_t len, uint32_t h)
> +{
> + dt_rohash_t *hp;
> +
> + for (hp = dp->hash[h]; hp != NULL; hp = hp->next) {
> + if (dt_rodata_compare(dp, hp, ptr, len) == 0)
> + return hp->off;
> + }
> +
> + return -1;
> +}
> +
> +ssize_t
> +dt_rodata_index(dt_rodata_t *dp, const char *ptr, size_t len)
So far as I can tell, no one uses this function.
> +{
> + ssize_t rc;
> + uint32_t h;
> +
> + if (ptr == NULL || ptr[0] == '\0')
> + return 0; /* NULL or a single 0-byte are at offset 0. */
Why the comment and the ptr[0]=='\0' check?
> +
> + h = dt_gen_hval(ptr, len, len) % dp->hashsz;
> + rc = dt_rodata_xindex(dp, ptr, len, h);
> +
> + return rc;
> +}
> +
> +ssize_t
> +dt_rodata_insert(dt_rodata_t *dp, const char *ptr, size_t len)
> +{
> + dt_rohash_t *hp;
> + ssize_t off;
> + uint32_t h;
> +
> + if (ptr == NULL || ptr[0] == '\0')
> + return 0; /* NULL or a single 0-byte are at offset 0. */
> +
Why the comment and the ptr[0]=='\0' check?
> + h = dt_gen_hval(ptr, len, len) % dp->hashsz;
> + off = dt_rodata_xindex(dp, ptr, len, h);
> + if (off != -1)
> + return off;
> +
> + /*
> + * Create a new hash bucket, initialize it, and insert it at the front
> + * of the hash chain for the appropriate bucket.
> + */
> + if ((hp = malloc(sizeof(dt_rohash_t))) == NULL)
> + return -1L;
> +
> + hp->data = dp->ptr;
> + hp->buf = dp->nbufs - 1;
> + hp->off = dp->size;
> + hp->len = len;
> + hp->next = dp->hash[h];
> +
> + /*
> + * Now copy the data into our buffer list, and then update the global
> + * counts of items and bytes. Return the item's byte offset.
> + */
> + if (dt_rodata_copyin(dp, ptr, len) == -1) {
> + free(hp);
> + return -1L;
> + }
> +
> + dp->count++;
> + dp->size += len;
> + dp->hash[h] = hp;
> + if (dp->item_size < len)
> + dp->item_size = len;
> +
> + return hp->off;
> +}
> +
> +/*
> + * Copy data from the compile-time read-only data into a DIFO-style read-only
What do "compile-time" and "DIFO-style" mean?
> + * storage memory block.
> + */
> +ssize_t
> +dt_rodata_copy(const char *s, size_t n, size_t off, char *buf)
Why not just inline this function? There is no other copy function to use.
> +{
> + memcpy(buf + off, s, n);
> + return n;
> +}
> +
> +ssize_t
> +dt_rodata_write(const dt_rodata_t *dp, dt_rodata_copy_f *func, void *private)
Why make the caller specify func? In particular, the only func used is
dt_rodata_copy(), which is in this file but the callers are outside of
this file. At the very least, insert "if (func == NULL) func =
dt_rodata_copy" so that a caller does not HAVE to specify func.
> +{
> + ssize_t res, total = 0;
> + size_t i;
> + size_t n;
> +
> + for (i = 0; i < dp->nbufs; i++, total += res) {
> + if (i == dp->nbufs - 1)
> + n = dp->ptr - dp->bufs[i];
> + else
> + n = dp->bufsz;
> +
> + if ((res = func(dp->bufs[i], n, total, private)) <= 0)
> + break;
> + }
> +
> + if (total == 0 && dp->size != 0)
> + return -1;
Why not just (total != dp->size)?
> +
> + return total;
> +}
> diff --git a/libdtrace/dt_rodata.h b/libdtrace/dt_rodata.h
> @@ -0,0 +1,36 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2006, 2021, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
2021?
More information about the DTrace-devel
mailing list