[DTrace-devel] [PATCH v3] Add rodata support

Thu Aug 17 22:10:25 UTC 2023

On 8/17/23 10:53, Kris Van Hees via DTrace-devel wrote:
> This patch provides for read-only data to be placed in its own memory
> block (much like the .rodata ELF section).  It lays the groundwork for
> future enhancements that support .rodata sections in precompiled BPF
> code (as may be generated for static data items), and for the storage
> of read-only data that is not constrainted by the maximum string size.

constrained

> It's first user will be the inet_ntoa6() implementation that needs to

Its

> store a read-only lookup table.
>
> The string table can be considered a special case of read-only data
> (as evidenced by the clear duplication of code between dt_rodata.c and
> dt_strtab.c) and a future patch will convert the strtab implementation
> to be a wrapper for a rodata-style memory block.

I'd drop the parenthetical remark.

> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> @@ -695,10 +695,15 @@ gmap_create_scratchmem(dtrace_hdl_t *dtp)
>    * String table map.  This is a global map with a singleton element (key 0)

Here and lots of places:  it isn't really the "string table map" any more.

>    * that contains the entire string table as a concatenation of all unique
>    * strings (each terminated with a NUL byte).  The string table size is taken
> - * from the DTrace consumer handle (dt_strlen).  Extra memory is allocated as a
> - * memory block of zeros for initializing memory regions.  Its size is at least
> - * the maximum string size to ensure the BPF verifier can validate all access
> - * requests for dynamic references to string constants.
> + * from the DTrace consumer handle (dt_strlen).  The read-only data is appended
> + * to the string table, as is a memory block of zeros for initializing memory
> + * regions.  Its size is at least

Its size?  Whose size?  And.... is at least what?

> + *
> + * In order to ensure the BPF verifier can validate all access requests for
> + * dybamic references to string constants, the size of the read-only data plus
> + * the size of the block of zeros must at least match the maximum string size.
> + * The size of the block of zeros must at least match the maximum read-only
> + * item size.

Okay but "the size of the block of zeros" could perhaps be dt_zerosize?

Anyhow, "dynamic."

>    */
>   static int
>   gmap_create_strtab(dtrace_hdl_t *dtp)
> @@ -711,13 +716,20 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
>   	int		fd, rc, err;
>   
>   	dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
> -	dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> +	dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> +	dtp->dt_rosize = dt_rodata_size(dtp->dt_rodata);
> +	dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_rooffset + dtp->dt_rosize, 8);
> +	dtp->dt_zerosize = 0;
> +	sz = dt_rodata_max_item_size(dtp->dt_rodata);

To me, it makes much more sense to put those last two lines after the 
next comment.

>   	/*
>   	 * Ensure the zero-filled memory at the end of the strtab is large
>   	 * enough to accomodate all needs for such a memory block.
>   	 */

accommodate

> -	dtp->dt_zerosize = strsize + 1;
> +	if (dtp->dt_rosize < strsize + 1)
> +		dtp->dt_zerosize = strsize + 1 - dtp->dt_rosize;
> +	if (dtp->dt_zerosize < sz)
> +		dtp->dt_zerosize = sz;
>   	if (dtp->dt_zerosize < dtp->dt_maxdvarsize)
>   		dtp->dt_zerosize = dtp->dt_maxdvarsize;
>   	if (dtp->dt_zerosize < dtp->dt_maxtuplesize)

It'd probably be a bit clearer if MAX() were used.

> @@ -730,6 +742,7 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
>   	if (strtab == NULL)
>   		return dt_set_errno(dtp, EDT_NOMEM);
>   
> +	/* Copy the string table data. */
>   	dt_strtab_write(dtp->dt_ccstab, (dt_strtab_write_f *)dt_strtab_copystr,
>   			strtab);
> @@ -745,6 +758,10 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
>   		buf += len + 1;
>   	}
>   
> +	/* Copy the read-only data. */
> +	dt_rodata_write(dtp->dt_rodata, (dt_rodata_copy_f *)dt_rodata_copy,
> +			strtab + dtp->dt_rooffset);

Should we now have a dt_rodata_destroy()?

I'd make that second argument NULL.  (See my comment later about 
dt_rodata_write().)

Should we check the return value of the function?

>   	fd = create_gmap(dtp, "strtab", BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
>   			 sz, 1);
>   	if (fd == -1)
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> @@ -56,6 +56,7 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act)
>   	dt_ident_t	*mem = dt_dlib_get_map(dtp, "mem");
>   	dt_ident_t	*state = dt_dlib_get_map(dtp, "state");
>   	dt_ident_t	*prid = dt_dlib_get_var(pcb->pcb_hdl, "PRID");
> +	dt_ident_t	*ro_off = dt_dlib_get_var(dtp, "RODATA_OFF");
>   	uint_t		lbl_exit = pcb->pcb_exitlbl;
>   

Should there be an assert(ro_off), at least to be consistent with 
existing practice?

> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> @@ -888,6 +888,7 @@ dt_vopen(int version, int flags, int *errp,
>   	    DIF_VAR_OTHER_UBASE, DIF_VAR_OTHER_MAX);
>   
>   	dtp->dt_ccstab = dt_strtab_create(BUFSIZ);
> +	dtp->dt_rodata = dt_rodata_create(BUFSIZ);
>   	dtp->dt_strlen = 0;
>   

Should we check if dt_rodata is NULL?  And where is it destroyed? Right 
after the dt_rodata_write() call?  Or in dt_close()?

> diff --git a/libdtrace/dt_rodata.c b/libdtrace/dt_rodata.c
> @@ -0,0 +1,307 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2006, 2023, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <sys/types.h>

FWIW, sys/types.h might be brought in by dt_rodata.h.

> +#include <string.h>
> +#include <stdlib.h>
> +#include <assert.h>
> +
> +#include <dt_rodata.h>
> +#include <dt_string.h>
> +#include <dt_impl.h>
> +

I'd like a block comment like this near the top of this file:

     The function dt_rodata_insert(dp, ptr, len) takes an input block of 
data,
     described by ptr and len, and appends it to a monolithic buffer 
described
     by dp.  The function returns the offset in the buffer where the data is
     stored.

     Actually, dp also has a hash table so that if the data block 
already appears
     in the buffer, an offset to the pre-existing copy is returned and 
no new
     copy is added.

     Further, dp actually stores the buffer in a growable set of smaller 
chunks,
     each chunk of size originally specified in dt_rodata_create(bufsz).
     The buffer can be written out to a single, linear block of memory using
     dt_rodata_write().

> +typedef struct dt_rohash	dt_rohash_t;
> +struct dt_rohash {
> +	const char	*data;		/* pointer to actual data */
> +	size_t		buf;		/* index of data buffer */
> +	size_t		off;		/* offset in bytes of this item */
> +	size_t		len;		/* length in bytes of this item */
> +	dt_rohash_t	*next;		/* next item in hash chain */
> +};

"of this item" might be unnecessary (both instances)

> +
> +struct dt_rodata {
> +	dt_rohash_t	**hash;		/* array of hash buckets */

Name it "hash bucket array" to make the correspondence to the next line 
clearer.

> +	size_t		hashsz;		/* size of hash bucket array */

This is set to 211?  Is that actually a reasonable size for this? (Not a 
big deal, I suppose.)

> +	char		**bufs;		/* array of buffer pointers */
> +	char		*ptr;		/* pointer to current buffer location */
> +	size_t		nbufs;		/* size of buffer pointer array */
> +	size_t		bufsz;		/* size of individual buffer */
> +	size_t		count;		/* total number of items in rodata */
>

So far as I can tell, no one uses "count".

> +	size_t		size;		/* total size of items in bytes */
> +	size_t		item_size;	/* maximum item size */

Rename to "max_item_size".

> +dt_rodata_t *
> +dt_rodata_create(size_t bufsz)
> +{
> +	dt_rodata_t	*dp = malloc(sizeof(dt_rodata_t));
> +	size_t		nbuckets = _dtrace_strbuckets;
> +
> +	assert(bufsz != 0);
> +
> +	if (dp == NULL)
> +		return NULL;
> +
> +	memset(dp, 0, sizeof(dt_rodata_t));
> +	dp->hash = malloc(nbuckets * sizeof(dt_rohash_t *));
> +	if (dp->hash == NULL)
> +		goto err;
> +
> +	memset(dp->hash, 0, nbuckets * sizeof(dt_rohash_t *));
> +	dp->hashsz = nbuckets;
> +	dp->bufs = NULL;
> +	dp->ptr = NULL;
> +	dp->nbufs = 0;
> +	dp->bufsz = bufsz;
> +
> +	if (dt_rodata_grow(dp) == -1)
> +		goto err;

Why not let the first dt_rodata_grow() be made by the first "insert"?

> +
> +	/* Pre-populate the read-only data with a 0-byte at offset 0.  */
> +	*dp->ptr++ = 0;
> +	dp->size = 1;
> +	dp->count = 1;

Why prepopulate?  And why not with an "insert" call?

> +
> +	return dp;
> +
> +err:
> +	dt_rodata_destroy(dp);
> +	return NULL;
> +}
> +
> +static int
> +dt_rodata_compare(dt_rodata_t *dp, dt_rohash_t *hp, const char *ptr, size_t len)
> +{
> +	size_t		b = hp->buf;
> +	const char	*buf = hp->data;
> +	size_t		resid, n;
> +	int		rv;

resid, n, and rv can all be declared inside the while() loop.

> +
> +	while (len != 0) {
> +		if (buf == dp->bufs[b] + dp->bufsz)
> +			buf = dp->bufs[++b];
> +
> +		resid = dp->bufs[b] + dp->bufsz - buf;
> +		n = MIN(resid, len);
> +
> +		if ((rv = memcmp(buf, ptr, n)) != 0)
> +			return rv;
> +
> +		buf += n;
> +		ptr += n;
> +		len -= n;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +dt_rodata_copyin(dt_rodata_t *dp, const char *ptr, size_t len)
> +{
> +	char	*old_p = dp->ptr;
> +	size_t	old_n = dp->nbufs;
> +	size_t	b = dp->nbufs - 1;
> +	size_t	resid, n;

resid and n should both be declared inside the while() loop.

> +
> +	while (len != 0) {
> +		if (dp->ptr == dp->bufs[b] + dp->bufsz) {
> +			if (dt_rodata_grow(dp) == -1)
> +				goto err;
> +			b++;
> +		}
> +
> +		resid = dp->bufs[b] + dp->bufsz - dp->ptr;
> +		n = MIN(resid, len);
> +		memcpy(dp->ptr, ptr, n);
> +
> +		dp->ptr += n;
> +		ptr += n;
> +		len -= n;
> +	}
> +
> +	return 0;
> +
> +err:
> +	while (dp->nbufs != old_n)
> +		free(dp->bufs[--dp->nbufs]);
> +
> +	dp->ptr = old_p;
> +	return -1;
> +}
> +
> +static ssize_t
> +dt_rodata_xindex(dt_rodata_t *dp, const char *ptr, size_t len, uint32_t h)
> +{
> +	dt_rohash_t	*hp;
> +
> +	for (hp = dp->hash[h]; hp != NULL; hp = hp->next) {
> +		if (dt_rodata_compare(dp, hp, ptr, len) == 0)
> +			return hp->off;
> +	}
> +
> +	return -1;
> +}
> +
> +ssize_t
> +dt_rodata_index(dt_rodata_t *dp, const char *ptr, size_t len)

So far as I can tell, no one uses this function.

> +{
> +	ssize_t		rc;
> +	uint32_t	h;
> +
> +	if (ptr == NULL || ptr[0] == '\0')
> +		return 0;	/* NULL or a single 0-byte are at offset 0. */

Why the comment and the ptr[0]=='\0' check?

> +
> +	h = dt_gen_hval(ptr, len, len) % dp->hashsz;
> +	rc = dt_rodata_xindex(dp, ptr, len, h);
> +
> +	return rc;
> +}
> +
> +ssize_t
> +dt_rodata_insert(dt_rodata_t *dp, const char *ptr, size_t len)
> +{
> +	dt_rohash_t	*hp;
> +	ssize_t		off;
> +	uint32_t	h;
> +
> +	if (ptr == NULL || ptr[0] == '\0')
> +		return 0;	/* NULL or a single 0-byte are at offset 0. */
> +

Why the comment and the ptr[0]=='\0' check?

> +	h = dt_gen_hval(ptr, len, len) % dp->hashsz;
> +	off = dt_rodata_xindex(dp, ptr, len, h);
> +	if (off != -1)
> +		return off;
> +
> +	/*
> +	 * Create a new hash bucket, initialize it, and insert it at the front
> +	 * of the hash chain for the appropriate bucket.
> +	 */
> +	if ((hp = malloc(sizeof(dt_rohash_t))) == NULL)
> +		return -1L;
> +
> +	hp->data = dp->ptr;
> +	hp->buf = dp->nbufs - 1;
> +	hp->off = dp->size;
> +	hp->len = len;
> +	hp->next = dp->hash[h];
> +
> +	/*
> +	 * Now copy the data into our buffer list, and then update the global
> +	 * counts of items and bytes.  Return the item's byte offset.
> +	 */
> +	if (dt_rodata_copyin(dp, ptr, len) == -1) {
> +		free(hp);
> +		return -1L;
> +	}
> +
> +	dp->count++;
> +	dp->size += len;
> +	dp->hash[h] = hp;
> +	if (dp->item_size < len)
> +		dp->item_size = len;
> +
> +	return hp->off;
> +}
> +
> +/*
> + * Copy data from the compile-time read-only data into a DIFO-style read-only

What do "compile-time" and "DIFO-style" mean?

> + * storage memory block.
> + */
> +ssize_t
> +dt_rodata_copy(const char *s, size_t n, size_t off, char *buf)

Why not just inline this function?  There is no other copy function to use.

> +{
> +	memcpy(buf + off, s, n);
> +	return n;
> +}
> +
> +ssize_t
> +dt_rodata_write(const dt_rodata_t *dp, dt_rodata_copy_f *func, void *private)

Why make the caller specify func?  In particular, the only func used is 
dt_rodata_copy(), which is in this file but the callers are outside of 
this file.  At the very least, insert "if (func == NULL) func = 
dt_rodata_copy" so that a caller does not HAVE to specify func.

> +{
> +	ssize_t	res, total = 0;
> +	size_t	i;
> +	size_t	n;
> +
> +	for (i = 0; i < dp->nbufs; i++, total += res) {
> +		if (i == dp->nbufs - 1)
> +			n = dp->ptr - dp->bufs[i];
> +		else
> +			n = dp->bufsz;
> +
> +		if ((res = func(dp->bufs[i], n, total, private)) <= 0)
> +			break;
> +	}
> +
> +	if (total == 0 && dp->size != 0)
> +		return -1;

Why not just (total != dp->size)?

> +
> +	return total;
> +}
> diff --git a/libdtrace/dt_rodata.h b/libdtrace/dt_rodata.h
> @@ -0,0 +1,36 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2006, 2021, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */

2021?