[DTrace-devel] [PATCH v3] Add rodata support

Fri Aug 18 02:04:57 UTC 2023

On Thu, Aug 17, 2023 at 06:10:25PM -0400, Eugene Loh via DTrace-devel wrote:
> On 8/17/23 10:53, Kris Van Hees via DTrace-devel wrote:
> > This patch provides for read-only data to be placed in its own memory
> > block (much like the .rodata ELF section).  It lays the groundwork for
> > future enhancements that support .rodata sections in precompiled BPF
> > code (as may be generated for static data items), and for the storage
> > of read-only data that is not constrainted by the maximum string size.
> 
> constrained

ack

> > It's first user will be the inet_ntoa6() implementation that needs to
> 
> Its

ack

> > store a read-only lookup table.
> > 
> > The string table can be considered a special case of read-only data
> > (as evidenced by the clear duplication of code between dt_rodata.c and
> > dt_strtab.c) and a future patch will convert the strtab implementation
> > to be a wrapper for a rodata-style memory block.
> 
> I'd drop the parenthetical remark.

I think it has merit because it *is* close to verbatim duplication for now.
> 
> > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > @@ -695,10 +695,15 @@ gmap_create_scratchmem(dtrace_hdl_t *dtp)
> >    * String table map.  This is a global map with a singleton element (key 0)
> 
> Here and lots of places:  it isn't really the "string table map" any more.

Not worth updating at this point because that was already the case when we
added the zeros area.  I think it is fair to retain this and update it when I
do the work to implement strtab as a form of rodata.  Besides, it is still
being created as a strtab BPF map so the naming based on the primary use is
not really wrong.

> >    * that contains the entire string table as a concatenation of all unique
> >    * strings (each terminated with a NUL byte).  The string table size is taken
> > - * from the DTrace consumer handle (dt_strlen).  Extra memory is allocated as a
> > - * memory block of zeros for initializing memory regions.  Its size is at least
> > - * the maximum string size to ensure the BPF verifier can validate all access
> > - * requests for dynamic references to string constants.
> > + * from the DTrace consumer handle (dt_strlen).  The read-only data is appended
> > + * to the string table, as is a memory block of zeros for initializing memory
> > + * regions.  Its size is at least
> 
> Its size?  Whose size?  And.... is at least what?

Bad update of the comment.  The next part is what should be there.

> > + *
> > + * In order to ensure the BPF verifier can validate all access requests for
> > + * dybamic references to string constants, the size of the read-only data plus
> > + * the size of the block of zeros must at least match the maximum string size.
> > + * The size of the block of zeros must at least match the maximum read-only
> > + * item size.
> 
> Okay but "the size of the block of zeros" could perhaps be dt_zerosize?

Either way.

> Anyhow, "dynamic."

ack

> >    */
> >   static int
> >   gmap_create_strtab(dtrace_hdl_t *dtp)
> > @@ -711,13 +716,20 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> >   	int		fd, rc, err;
> >   	dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
> > -	dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> > +	dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> > +	dtp->dt_rosize = dt_rodata_size(dtp->dt_rodata);
> > +	dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_rooffset + dtp->dt_rosize, 8);
> > +	dtp->dt_zerosize = 0;
> > +	sz = dt_rodata_max_item_size(dtp->dt_rodata);
> 
> To me, it makes much more sense to put those last two lines after the next
> comment.

I see it more as first initializing these variables, and then we do some
updating based on the conditions.

> >   	/*
> >   	 * Ensure the zero-filled memory at the end of the strtab is large
> >   	 * enough to accomodate all needs for such a memory block.
> >   	 */
> 
> accommodate

ack

> > -	dtp->dt_zerosize = strsize + 1;
> > +	if (dtp->dt_rosize < strsize + 1)
> > +		dtp->dt_zerosize = strsize + 1 - dtp->dt_rosize;
> > +	if (dtp->dt_zerosize < sz)
> > +		dtp->dt_zerosize = sz;
> >   	if (dtp->dt_zerosize < dtp->dt_maxdvarsize)
> >   		dtp->dt_zerosize = dtp->dt_maxdvarsize;
> >   	if (dtp->dt_zerosize < dtp->dt_maxtuplesize)
> 
> It'd probably be a bit clearer if MAX() were used.

Hm, I don't think so.  I had that there at some point and felt it was not
conveying the logic as well as the conditionals do.

> > @@ -730,6 +742,7 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> >   	if (strtab == NULL)
> >   		return dt_set_errno(dtp, EDT_NOMEM);
> > +	/* Copy the string table data. */
> >   	dt_strtab_write(dtp->dt_ccstab, (dt_strtab_write_f *)dt_strtab_copystr,
> >   			strtab);
> > @@ -745,6 +758,10 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> >   		buf += len + 1;
> >   	}
> > +	/* Copy the read-only data. */
> > +	dt_rodata_write(dtp->dt_rodata, (dt_rodata_copy_f *)dt_rodata_copy,
> > +			strtab + dtp->dt_rooffset);
> 
> Should we now have a dt_rodata_destroy()?

You mean, at this point?  Perhaps, but we do not do so for strtab (and perhaps
that is also something we can do.  I'll have to investigate that a bit more in
terms of what is needed/wanted.  It is consistent with current code though to
not do the destroy here.

But there should be a dt_rodata_destroy() in dt_open.c when we close the
DTrace handle.  Will add that.

> I'd make that second argument NULL.  (See my comment later about
> dt_rodata_write().)

I disagree (likewise, see below).

> Should we check the return value of the function?

Not necessary here.  It something goes wrong here we have much bigger issues.

> >   	fd = create_gmap(dtp, "strtab", BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
> >   			 sz, 1);
> >   	if (fd == -1)
> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > @@ -56,6 +56,7 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act)
> >   	dt_ident_t	*mem = dt_dlib_get_map(dtp, "mem");
> >   	dt_ident_t	*state = dt_dlib_get_map(dtp, "state");
> >   	dt_ident_t	*prid = dt_dlib_get_var(pcb->pcb_hdl, "PRID");
> > +	dt_ident_t	*ro_off = dt_dlib_get_var(dtp, "RODATA_OFF");
> >   	uint_t		lbl_exit = pcb->pcb_exitlbl;
> 
> Should there be an assert(ro_off), at least to be consistent with existing
> practice?

Ah yes, thanks.

> > diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> > @@ -888,6 +888,7 @@ dt_vopen(int version, int flags, int *errp,
> >   	    DIF_VAR_OTHER_UBASE, DIF_VAR_OTHER_MAX);
> >   	dtp->dt_ccstab = dt_strtab_create(BUFSIZ);
> > +	dtp->dt_rodata = dt_rodata_create(BUFSIZ);
> >   	dtp->dt_strlen = 0;
> 
> Should we check if dt_rodata is NULL?  And where is it destroyed? Right
> after the dt_rodata_write() call?  Or in dt_close()?

Yes, will add.  And will add (as mentioned above) a destroy in the close.

> > diff --git a/libdtrace/dt_rodata.c b/libdtrace/dt_rodata.c
> > @@ -0,0 +1,307 @@
> > +/*
> > + * Oracle Linux DTrace.
> > + * Copyright (c) 2006, 2023, Oracle and/or its affiliates. All rights reserved.
> > + * Licensed under the Universal Permissive License v 1.0 as shown at
> > + * http://oss.oracle.com/licenses/upl.
> > + */
> > +
> > +#include <sys/types.h>
> 
> FWIW, sys/types.h might be brought in by dt_rodata.h.
> 
> > +#include <string.h>
> > +#include <stdlib.h>
> > +#include <assert.h>
> > +
> > +#include <dt_rodata.h>
> > +#include <dt_string.h>
> > +#include <dt_impl.h>
> > +
> 
> I'd like a block comment like this near the top of this file:

That seems really out of place.

>     The function dt_rodata_insert(dp, ptr, len) takes an input block of
> data,
>     described by ptr and len, and appends it to a monolithic buffer
> described
>     by dp.  The function returns the offset in the buffer where the data is
>     stored.
> 
>     Actually, dp also has a hash table so that if the data block already
> appears
>     in the buffer, an offset to the pre-existing copy is returned and no new
>     copy is added.
> 
>     Further, dp actually stores the buffer in a growable set of smaller
> chunks,
>     each chunk of size originally specified in dt_rodata_create(bufsz).
>     The buffer can be written out to a single, linear block of memory using
>     dt_rodata_write().
> 
> > +typedef struct dt_rohash	dt_rohash_t;
> > +struct dt_rohash {
> > +	const char	*data;		/* pointer to actual data */
> > +	size_t		buf;		/* index of data buffer */
> > +	size_t		off;		/* offset in bytes of this item */
> > +	size_t		len;		/* length in bytes of this item */
> > +	dt_rohash_t	*next;		/* next item in hash chain */
> > +};
> 
> "of this item" might be unnecessary (both instances)
> 
> > +
> > +struct dt_rodata {
> > +	dt_rohash_t	**hash;		/* array of hash buckets */
> 
> Name it "hash bucket array" to make the correspondence to the next line
> clearer.
> 
> > +	size_t		hashsz;		/* size of hash bucket array */
> 
> This is set to 211?  Is that actually a reasonable size for this? (Not a big
> deal, I suppose.)
> 
> > +	char		**bufs;		/* array of buffer pointers */
> > +	char		*ptr;		/* pointer to current buffer location */
> > +	size_t		nbufs;		/* size of buffer pointer array */
> > +	size_t		bufsz;		/* size of individual buffer */
> > +	size_t		count;		/* total number of items in rodata */
> > 
> 
> So far as I can tell, no one uses "count".
> 
> > +	size_t		size;		/* total size of items in bytes */
> > +	size_t		item_size;	/* maximum item size */
> 
> Rename to "max_item_size".

I had that but decided it was too long and not really necessary to be that
verbose.

> > +dt_rodata_t *
> > +dt_rodata_create(size_t bufsz)
> > +{
> > +	dt_rodata_t	*dp = malloc(sizeof(dt_rodata_t));
> > +	size_t		nbuckets = _dtrace_strbuckets;
> > +
> > +	assert(bufsz != 0);
> > +
> > +	if (dp == NULL)
> > +		return NULL;
> > +
> > +	memset(dp, 0, sizeof(dt_rodata_t));
> > +	dp->hash = malloc(nbuckets * sizeof(dt_rohash_t *));
> > +	if (dp->hash == NULL)
> > +		goto err;
> > +
> > +	memset(dp->hash, 0, nbuckets * sizeof(dt_rohash_t *));
> > +	dp->hashsz = nbuckets;
> > +	dp->bufs = NULL;
> > +	dp->ptr = NULL;
> > +	dp->nbufs = 0;
> > +	dp->bufsz = bufsz;
> > +
> > +	if (dt_rodata_grow(dp) == -1)
> > +		goto err;
> 
> Why not let the first dt_rodata_grow() be made by the first "insert"?

Since we want a first NULL element.

> > +
> > +	/* Pre-populate the read-only data with a 0-byte at offset 0.  */
> > +	*dp->ptr++ = 0;
> > +	dp->size = 1;
> > +	dp->count = 1;
> 
> Why prepopulate?  And why not with an "insert" call?

To guarantee a NULL element at offset 0.

> > +
> > +	return dp;
> > +
> > +err:
> > +	dt_rodata_destroy(dp);
> > +	return NULL;
> > +}
> > +
> > +static int
> > +dt_rodata_compare(dt_rodata_t *dp, dt_rohash_t *hp, const char *ptr, size_t len)
> > +{
> > +	size_t		b = hp->buf;
> > +	const char	*buf = hp->data;
> > +	size_t		resid, n;
> > +	int		rv;
> 
> resid, n, and rv can all be declared inside the while() loop.

I kept it as it was in dt_strtab.c to make it clear this ia a slightly modified
version of dt_strtab.c.  When consolidating and making strtab a specialized
version of rodata would be a good time to clean this up.

But if you really want it done now, I can do it.

> > +
> > +	while (len != 0) {
> > +		if (buf == dp->bufs[b] + dp->bufsz)
> > +			buf = dp->bufs[++b];
> > +
> > +		resid = dp->bufs[b] + dp->bufsz - buf;
> > +		n = MIN(resid, len);
> > +
> > +		if ((rv = memcmp(buf, ptr, n)) != 0)
> > +			return rv;
> > +
> > +		buf += n;
> > +		ptr += n;
> > +		len -= n;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +dt_rodata_copyin(dt_rodata_t *dp, const char *ptr, size_t len)
> > +{
> > +	char	*old_p = dp->ptr;
> > +	size_t	old_n = dp->nbufs;
> > +	size_t	b = dp->nbufs - 1;
> > +	size_t	resid, n;
> 
> resid and n should both be declared inside the while() loop.

Same.

> > +
> > +	while (len != 0) {
> > +		if (dp->ptr == dp->bufs[b] + dp->bufsz) {
> > +			if (dt_rodata_grow(dp) == -1)
> > +				goto err;
> > +			b++;
> > +		}
> > +
> > +		resid = dp->bufs[b] + dp->bufsz - dp->ptr;
> > +		n = MIN(resid, len);
> > +		memcpy(dp->ptr, ptr, n);
> > +
> > +		dp->ptr += n;
> > +		ptr += n;
> > +		len -= n;
> > +	}
> > +
> > +	return 0;
> > +
> > +err:
> > +	while (dp->nbufs != old_n)
> > +		free(dp->bufs[--dp->nbufs]);
> > +
> > +	dp->ptr = old_p;
> > +	return -1;
> > +}
> > +
> > +static ssize_t
> > +dt_rodata_xindex(dt_rodata_t *dp, const char *ptr, size_t len, uint32_t h)
> > +{
> > +	dt_rohash_t	*hp;
> > +
> > +	for (hp = dp->hash[h]; hp != NULL; hp = hp->next) {
> > +		if (dt_rodata_compare(dp, hp, ptr, len) == 0)
> > +			return hp->off;
> > +	}
> > +
> > +	return -1;
> > +}
> > +
> > +ssize_t
> > +dt_rodata_index(dt_rodata_t *dp, const char *ptr, size_t len)
> 
> So far as I can tell, no one uses this function.

strtab has it and it will be needed if we populate rodata from other objects
into the central consolidated rodata block.  So might as well have it here.
> 
> > +{
> > +	ssize_t		rc;
> > +	uint32_t	h;
> > +
> > +	if (ptr == NULL || ptr[0] == '\0')
> > +		return 0;	/* NULL or a single 0-byte are at offset 0. */
> 
> Why the comment and the ptr[0]=='\0' check?

Because we have a prepopulated entry at offset 0 to represent a NULL or empty
memory block (considered to be the same as NULL).

> > +
> > +	h = dt_gen_hval(ptr, len, len) % dp->hashsz;
> > +	rc = dt_rodata_xindex(dp, ptr, len, h);
> > +
> > +	return rc;
> > +}
> > +
> > +ssize_t
> > +dt_rodata_insert(dt_rodata_t *dp, const char *ptr, size_t len)
> > +{
> > +	dt_rohash_t	*hp;
> > +	ssize_t		off;
> > +	uint32_t	h;
> > +
> > +	if (ptr == NULL || ptr[0] == '\0')
> > +		return 0;	/* NULL or a single 0-byte are at offset 0. */
> > +
> 
> Why the comment and the ptr[0]=='\0' check?

See above.

> > +	h = dt_gen_hval(ptr, len, len) % dp->hashsz;
> > +	off = dt_rodata_xindex(dp, ptr, len, h);
> > +	if (off != -1)
> > +		return off;
> > +
> > +	/*
> > +	 * Create a new hash bucket, initialize it, and insert it at the front
> > +	 * of the hash chain for the appropriate bucket.
> > +	 */
> > +	if ((hp = malloc(sizeof(dt_rohash_t))) == NULL)
> > +		return -1L;
> > +
> > +	hp->data = dp->ptr;
> > +	hp->buf = dp->nbufs - 1;
> > +	hp->off = dp->size;
> > +	hp->len = len;
> > +	hp->next = dp->hash[h];
> > +
> > +	/*
> > +	 * Now copy the data into our buffer list, and then update the global
> > +	 * counts of items and bytes.  Return the item's byte offset.
> > +	 */
> > +	if (dt_rodata_copyin(dp, ptr, len) == -1) {
> > +		free(hp);
> > +		return -1L;
> > +	}
> > +
> > +	dp->count++;
> > +	dp->size += len;
> > +	dp->hash[h] = hp;
> > +	if (dp->item_size < len)
> > +		dp->item_size = len;
> > +
> > +	return hp->off;
> > +}
> > +
> > +/*
> > + * Copy data from the compile-time read-only data into a DIFO-style read-only
> 
> What do "compile-time" and "DIFO-style" mean?

Explaining that is beyond the scope of this source code file.  That is matter
for the overall internals of DTrace.

> > + * storage memory block.
> > + */
> > +ssize_t
> > +dt_rodata_copy(const char *s, size_t n, size_t off, char *buf)
> 
> Why not just inline this function?  There is no other copy function to use.

But we can support multiple ones, and in fact, when strtab is implemented based
on rodata, we can use a single writer function and supply it with a custom
copy function (distinct from what rodata needs).

> > +{
> > +	memcpy(buf + off, s, n);
> > +	return n;
> > +}
> > +
> > +ssize_t
> > +dt_rodata_write(const dt_rodata_t *dp, dt_rodata_copy_f *func, void *private)
> 
> Why make the caller specify func?  In particular, the only func used is
> dt_rodata_copy(), which is in this file but the callers are outside of this
> file.  At the very least, insert "if (func == NULL) func = dt_rodata_copy"
> so that a caller does not HAVE to specify func.

See above.  I don't like the concept of a default function because that leads
to needing to go look at the writer function to figure out what the default is.
There is no real benefit from using a default.

> > +{
> > +	ssize_t	res, total = 0;
> > +	size_t	i;
> > +	size_t	n;
> > +
> > +	for (i = 0; i < dp->nbufs; i++, total += res) {
> > +		if (i == dp->nbufs - 1)
> > +			n = dp->ptr - dp->bufs[i];
> > +		else
> > +			n = dp->bufsz;
> > +
> > +		if ((res = func(dp->bufs[i], n, total, private)) <= 0)
> > +			break;
> > +	}
> > +
> > +	if (total == 0 && dp->size != 0)
> > +		return -1;
> 
> Why not just (total != dp->size)?

Consistency with the strtab code.

> > +
> > +	return total;
> > +}
> > diff --git a/libdtrace/dt_rodata.h b/libdtrace/dt_rodata.h
> > @@ -0,0 +1,36 @@
> > +/*
> > + * Oracle Linux DTrace.
> > + * Copyright (c) 2006, 2021, Oracle and/or its affiliates. All rights reserved.
> > + * Licensed under the Universal Permissive License v 1.0 as shown at
> > + * http://oss.oracle.com/licenses/upl.
> > + */
> 
> 2021?

ack

> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel