[DTrace-devel] [PATCH v3] Add rodata support

Mon Aug 21 18:55:19 UTC 2023

On Mon, Aug 21, 2023 at 01:54:43PM -0400, Eugene Loh via DTrace-devel wrote:
> I'm trying to drop issues as much as I reasonably can, but there are still
> some items below.
> 
> And another issue:  what about alignment?  What if I want to add some data
> that needs 4- or 8-byte alignment?  Do I do that by fabricating some bogus
> data of the right size and inserting that first, then inserting the data I
> want?  But what if the bogus data magically matches data I've already
> inserted, and the intended padding does not result?  Or, no promises of
> alignment?

Alignment is something that definitely may need to be looked at if there is
such a need in future uses.  The case of e.g. pulling in .rodata section data
from ELF objects may not need that since it is already aligned in the section.
But if the need arises, that is certainly an enhacement that will need to be
addressed.  That need does not exist right now.

> On 8/21/23 11:53, Kris Van Hees wrote:
> 
> > On Fri, Aug 18, 2023 at 03:17:23PM -0400, Eugene Loh wrote:
> > > On 8/17/23 22:04, Kris Van Hees wrote:
> > > > On Thu, Aug 17, 2023 at 06:10:25PM -0400, Eugene Loh via DTrace-devel wrote:
> > > > > On 8/17/23 10:53, Kris Van Hees via DTrace-devel wrote:
> > > > > I'd make that second argument NULL.? (See my comment later about
> > > > > dt_rodata_write().)
> > > > I disagree (likewise, see below).
> > > > > > +dt_rodata_t *
> > > > > > +dt_rodata_create(size_t bufsz)
> > > > > > +{
> > > > > > +	dt_rodata_t	*dp = malloc(sizeof(dt_rodata_t));
> > > > > > +	size_t		nbuckets = _dtrace_strbuckets;
> > > > > > +
> > > > > > +	assert(bufsz != 0);
> > > > > > +
> > > > > > +	if (dp == NULL)
> > > > > > +		return NULL;
> > > > > > +
> > > > > > +	memset(dp, 0, sizeof(dt_rodata_t));
> > > > > > +	dp->hash = malloc(nbuckets * sizeof(dt_rohash_t *));
> > > > > > +	if (dp->hash == NULL)
> > > > > > +		goto err;
> > > > > > +
> > > > > > +	memset(dp->hash, 0, nbuckets * sizeof(dt_rohash_t *));
> > > > > > +	dp->hashsz = nbuckets;
> > > > > > +	dp->bufs = NULL;
> > > > > > +	dp->ptr = NULL;
> > > > > > +	dp->nbufs = 0;
> > > > > > +	dp->bufsz = bufsz;
> > > > > > +
> > > > > > +	if (dt_rodata_grow(dp) == -1)
> > > > > > +		goto err;
> > > > > Why not let the first dt_rodata_grow() be made by the first "insert"?
> > > > Since we want a first NULL element.
> > > Why?? And why not insert it with an insert call?
> > Because you cannot add NULL to the rodata and because I need it there so I
> > can have a guaranteed value for NULL (offset 0).  I believe I commented on
> > that already before (as quoted below).
> 
> Does the value in byte 0 matter?  I'm guessing it does not.  It seems to me
> the point is not that we want a 0 in the first byte. Rather, we simply want
> to bump dp->ptr and dp->size up by 1 so that whenever an offset of 0 is
> returned we know a NULL pointer had been passed in.

Since it does not matter (for this) 0 is a valid, reasonable, and very useful
value.  I.e. not an issue.

> > > > > > +
> > > > > > +	/* Pre-populate the read-only data with a 0-byte at offset 0.  */
> > > > > > +	*dp->ptr++ = 0;
> > > > > > +	dp->size = 1;
> > > > > > +	dp->count = 1;
> > > > > Why prepopulate?? And why not with an "insert" call?
> > > > To guarantee a NULL element at offset 0.
> > > > > > + * storage memory block.
> > > > > > + */
> > > > > > +ssize_t
> > > > > > +dt_rodata_copy(const char *s, size_t n, size_t off, char *buf)
> > > > > Why not just inline this function?? There is no other copy function to use.
> > > > But we can support multiple ones, and in fact, when strtab is implemented based
> > > > on rodata, we can use a single writer function and supply it with a custom
> > > > copy function (distinct from what rodata needs).
> > > What would that function look like?? Today, dt_strtab_copystr() is
> > > simply a memcpy.
> > A function like that can (and should - though never did because the original
> > DTrace code is not always as robust as we'd like) ensure that we do not try
> > to write more data than the output buffer can accept.  Or a function can be
> > added (as I have done in debugging) to output the string table rather than
> > writing it to a memory block.
> > 
> > > > > > +{
> > > > > > +	memcpy(buf + off, s, n);
> > > > > > +	return n;
> > > > > > +}
> > > > > > +
> > > > > > +ssize_t
> > > > > > +dt_rodata_write(const dt_rodata_t *dp, dt_rodata_copy_f *func, void *private)
> > > > > Why make the caller specify func?? In particular, the only func used is
> > > > > dt_rodata_copy(), which is in this file but the callers are outside of this
> > > > > file.? At the very least, insert "if (func == NULL) func = dt_rodata_copy"
> > > > > so that a caller does not HAVE to specify func.
> > > > See above.  I don't like the concept of a default function because that leads
> > > > to needing to go look at the writer function to figure out what the default is.
> > > > There is no real benefit from using a default.
> > > Maybe it would help if I knew what other functions were being considered.
> > I honestly do not see that as being relevant for a code review.
> 
> The patch adds 100s of lines.  That's bad.  To judge the patch, one needs to
> ask "is this worth it?" and "does it do this correctly?" All that is hard to
> judge when I don't know what "this" is.

100s lines of code that (as discussed several times before) largely duplicate
code already *in* DTrace.  We have discussed this several times now - I think
it is time to put that argument to rest.
> 
> > The code
> > provides for using a callback function that is used for the actual processing
> > of the data chunks, and assuming the code here does what it needs to do: call
> > the callback function for each chunk in order, and verify return value of the
> > callback function to ensure the data was processed without problems, it does
> > what it is meant to do.
> > 
> > > Right now, my understanding is that rodata is some big linear space
> > > where I can dump blocks of data.? I'm given the linear offset to each
> > > block.? There is protection against duplicates.? The "write" function
> > > presumably dumps that linear space to somewhere designated by a
> > > pointer.? Why should I need to specify anything other than that
> > > pointer?? Why should I have to go looking at the writer function to
> > > figure out what the func argument is supposed to be?
> > Because, as I explain above, this function can support more callbacks.  It is
> > quite common when this programming technique is used that you need to consider
> > the call chain:
> > 
> > File A defines a callback function cb and calls an iterator function for some
> > opaque data structure, passing in cb to do the actual work.  This way the
> > actual implementation details of the data structure are encapsulated and in
> > a sense hidden from the code using it.  Which means it can change without the
> > need to change the code that uses it.
> 
> It seems to me that an iterator is very different.  In that case, you have a bunch of like things and the iterator does not need to know what those things are.
> 
> Here, I *thought* we were simply packing a memory area.  Stuff is loaded in (without producing any duplicates) and then we flush/dump it out.  I would think nothing beyond memcpy would be needed at this point.  If any processing had to take place on the data, it would already have taken place.  Specifically, if the data is going to grow, you're in trouble since the offsets are already baked in.  If the data is going to shrink, you'll be wasting space.  The only thing we know about the data is that they're sitting in some linear space.

Aside from the fact that e.g. something like the strtab may want to do the
truncation of strings to strsize using the callback.  No need to make
assumptions about what the callback may or may not do.  That is outside the
scope of this patch.  The patch provides a function that will feed the data
in chunks to a callback - does it do so correctly or not?