[DTrace-devel] [PATCH v2 08/11] htab: add a _next-style iterator

Wed Oct 27 07:10:52 PDT 2021

On 22 Oct 2021, Eugene Loh said:

> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
>
> How about renaming the patch "htab: add an iterator".  The kind of 
> iterator doesn't that much matter.  And I don't know what _next-style is 
> anyhow.

I'm specifically noting it for users of libctf who might already be
familiar with this class of iterators, because it is rather *strange*
and since I invented the class literally last year it might be worth
noting that I'm not adding a repeated-function-call-style _iter()
iterator (which is the style one would usually expect in C, and is
clumsy as hell to use).

However, we do note that on the next line so you're right that this is
kind of useless on the first line. (Dropped.)

> For dt_consume.c and dt_kernel_module.c, as I mentioned in the other 
> review, add next after del to stick to the same order that is 
> established in dt_htab.h.  Also, consider adding a 
> DEFINE_HTAB_STD_OPS_RES(ID) macro to simplify the dt_htab_ops_t definitions.
>
> On 10/20/21 2:54 PM, Nick Alcock wrote:
>> This commit defines an in the libctf _next style because it's *so much*
>> nicer to use than the function-calling _iter form:
>
> "defines an in the"?  Did you mean
> "defines an iterator in the"?

Yes, whoops!

> Again, the kind of iterator seems unimportant to me.  Specifically, I'm 
> not familiar with libctf anyhow.

See https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=688d28f62146bf07b2ce1efd5380768d5ead418d

It's ever so much nicer to use than _iter iterators. You can retain
local variables and other state across and within iterations and jump
and return out of iteration and all the stuff you'd expect to do in a
while loop, because it *is* just a while loop, rather than repeated
function calls.

>> extern void *dt_htab_next(const dt_htab_t *htab, dt_htab_next_t **it);
>> extern void dt_htab_next_destroy(dt_htab_next_t *i);
>> extern void dt_htab_next_delete(dt_htab_next_t *i);
>>
>> Call dt_htab_next with the htab to iterate over and a dt_htab_next_t
>> initialized to NULL. It allocates an iterator, returns new elements from
>> the hash until iteration is complete, then frees the iterator and
>> re-annuls it, ready for another iteration cycle. If you need to exit
>> early, dt_htab_next_destroy frees the iterator for you (it's just a
>> free() for now, but see binutils-gdb/libctf/ctf-util.c:ctf_next_destroy
>> for how this can change over time if your iterator family can iterate
>> over many different things, like the ctf_next_t can.)
>>
>> There are no restrictions whatsoever on what you can do inside iteration
>> (iterate over other things, fork, longjmp out of it, you name it) except
>> for deleting the hash entry you're iterating over: for that, we have
>> dt_htab_next_delete. (You might repeat some elements you've already seen
>> after calling this, but for most use cases I can think of this is much
>> safer than missing some. This is not a fundamental limitation: I just
>> need to add a .prev htab operation, symmetric with .next, to fix it. For
>> now this feels unnecessary: the only likely caller of
>> dt_htab_next_delete doesn't care if it sees some elements more than
>> once.)

I'll admit I didn't think this sort of tutorial belonged in the commit
because binutils has a lot of examples of these things being used now.
But, of course, this is not binutils.

> I am not sure I understand this.  IIUC:
>
> *)  You cannot delete the entry the iterator is currently on since that 
> would break the iterator.

Yep. See struct dt_htab_next, which is the (internal) iterator state: a
htab, an index, a bucket pointer, and an entry within the bucket.
Deleting the entry being iterated over would leave the entry pointing at
garbage...

> *)  If you call the next_delete function, the iterator returns to the 
> bucket head, potentially revisiting some entries.

Yep.

> The fix for the second problem would be adding a .prev operation, as you 
> point out, and that would almost be shorter than the discussion of this 
> topic in the commit message!  (I think.  I might be wrong.)

That means changing every htab for something only rarely needed, and
seems hard to test too. I could easily have the iterator call ->prev if
provided, otherwise do as it does now, but that felt like
overengineering for something we are currently using in one place and
that doesn't care about revisiting existing elements anyway.

> But I'm unsure why there is a next_delete function in the first place.  
> I'm not convinced I'm thinking about this straight, but it seems to me 
> to be pretty routine to delete stuff in linked lists. You just make sure 
> where you go next before deleting where you're at.  E.g.,

That... could be done, but it would mean doubling up the dt_htab_delete
calls and tracking a _last by hand. (And I didn't think of it.)

>      last = NULL;
>      while (me = dt_htab_next(...)) {
>          [...]
>          dt_htab_delete(htab, last);
>          last = me;
>      }
>      dt_htab_delete(htab, last)
> Or something like that.

Yeah. Doesn't that seem uglier than
      while (me = dt_htab_next(...)) {
          [...]
          dt_htab_next_delete(...);
      };

given that dt_htab_next_delete is literally *three lines*, i.e. the same
amount of code you'd otherwise have to add to every single user!

But:

[...]
>> I'm unsure why there is a next_delete function in the first place....
>> I'm just thinking here.  Again, no need for anyone here to be making a 
>> career out of these functions.
> Another option is to have the iterator do something like:
>      while (entry == NULL) advance one entry;
>      retval = entry;
>      advance one entry;
>      return retval;
> That way, the caller can delete the current item.  No need for 
> next_delete() or .prev or messy commit-message explanations or repeating 
> elements.

i.e. have the iterator sitting at the *next* entry at all times? That
works! Makes the iterator a bit more complex but easier to use.

I'll do that and delete dt_htab_next_delete, since it is then strictly
redundant :) nicely spotted! obviously deleting the next entry is then
liable to be disastrous, but honestly anyone iterating over an htab and
thinking they can *randomly* delete elements from it at the same time is
dicing with death... at least this way you can safely delete the element
you're on and any element you've already seen -- but no iterating
repeatedly, remembering elements from past iterations and deleting them
in the middle of subsequent ones! (But... nobody would do that, right?
Certainly nobody does that right now. If you need to do it you can just
build up a list or htab of things you plan to delete, then delete them
after you're done with the htab iteration.)

> I'm just thinking here.  Again, no need for anyone here to be making a 
> career out of these functions.

Hell yes. I already did that in libctf :P

>> +/*
>> + * Iterate over a hashtable, returning each element in turn (as a pointer to
>> + * void).  NULL is returned at end-of-iteration (and OOM).
>> + */
>> +void *
>> +dt_htab_next(const dt_htab_t *htab, dt_htab_next_t **it)
>> +{
>> +	dt_htab_next_t *i = *it;
>> +	/*
>> +	 * Start of iteration.
>> +	 */
>
> Insert a blank line after the declaration and before the block comment.

Oops!

>> +	if (!i) {
>> +		if ((i = malloc(sizeof(dt_htab_next_t))) == NULL)
>> +			return NULL;
>> +		i->htab = htab;
>> +		i->idx = 0;
>> +		i->bucket = NULL;
>> +		i->ent = NULL;
>> +		*it = i;
>> +	}
>
> At this point, you could say "htab = i->htab" and change the remaining 
> i->htab dereferences in the rest of the function to simply htab, if you 
> want.

I'm trying to emphasise that the htab comes out of the iterator, not out
of the function call argument, and doing that replacement would make
this harder to see. I'm on the fence, though, because all that i-> does
make this function quite hard to read in any case!

-- 
NULL && (void)