[DTrace-devel] [PATCH v2 08/11] htab: add a _next-style iterator

Fri Oct 22 12:03:22 PDT 2021

Reviewed-by: Eugene Loh <eugene.loh at oracle.com>

How about renaming the patch "htab: add an iterator".  The kind of 
iterator doesn't that much matter.  And I don't know what _next-style is 
anyhow.

For dt_consume.c and dt_kernel_module.c, as I mentioned in the other 
review, add next after del to stick to the same order that is 
established in dt_htab.h.  Also, consider adding a 
DEFINE_HTAB_STD_OPS_RES(ID) macro to simplify the dt_htab_ops_t definitions.

On 10/20/21 2:54 PM, Nick Alcock wrote:
> This commit defines an in the libctf _next style because it's *so much*
> nicer to use than the function-calling _iter form:

"defines an in the"?  Did you mean
"defines an iterator in the"?

Again, the kind of iterator seems unimportant to me.  Specifically, I'm 
not familiar with libctf anyhow.

It would seem to me that a bunch of the usage information is better off 
in the source code, where a future user/developer would be looking, than 
in the commit message.
> extern void *dt_htab_next(const dt_htab_t *htab, dt_htab_next_t **it);
> extern void dt_htab_next_destroy(dt_htab_next_t *i);
> extern void dt_htab_next_delete(dt_htab_next_t *i);
>
> Call dt_htab_next with the htab to iterate over and a dt_htab_next_t
> initialized to NULL. It allocates an iterator, returns new elements from
> the hash until iteration is complete, then frees the iterator and
> re-annuls it, ready for another iteration cycle. If you need to exit
> early, dt_htab_next_destroy frees the iterator for you (it's just a
> free() for now, but see binutils-gdb/libctf/ctf-util.c:ctf_next_destroy
> for how this can change over time if your iterator family can iterate
> over many different things, like the ctf_next_t can.)
>
> There are no restrictions whatsoever on what you can do inside iteration
> (iterate over other things, fork, longjmp out of it, you name it) except
> for deleting the hash entry you're iterating over: for that, we have
> dt_htab_next_delete. (You might repeat some elements you've already seen
> after calling this, but for most use cases I can think of this is much
> safer than missing some. This is not a fundamental limitation: I just
> need to add a .prev htab operation, symmetric with .next, to fix it. For
> now this feels unnecessary: the only likely caller of
> dt_htab_next_delete doesn't care if it sees some elements more than
> once.)

I am not sure I understand this.  IIUC:

*)  You cannot delete the entry the iterator is currently on since that 
would break the iterator.

*)  If you call the next_delete function, the iterator returns to the 
bucket head, potentially revisiting some entries.

The fix for the second problem would be adding a .prev operation, as you 
point out, and that would almost be shorter than the discussion of this 
topic in the commit message!  (I think.  I might be wrong.)

But I'm unsure why there is a next_delete function in the first place.  
I'm not convinced I'm thinking about this straight, but it seems to me 
to be pretty routine to delete stuff in linked lists. You just make sure 
where you go next before deleting where you're at.  E.g.,
     last = NULL;
     while (me = dt_htab_next(...)) {
         [...]
         dt_htab_delete(htab, last);
         last = me;
     }
     dt_htab_delete(htab, last)
Or something like that.

I'm just thinking here.  Again, no need for anyone here to be making a 
career out of these functions.

> diff --git a/libdtrace/dt_htab.c b/libdtrace/dt_htab.c
> @@ -246,6 +253,78 @@ int dt_htab_delete(dt_htab_t *htab, void *entry)
>   	return 0;
>   }
>   
> +/*
> + * Iterate over a hashtable, returning each element in turn (as a pointer to
> + * void).  NULL is returned at end-of-iteration (and OOM).
> + */
> +void *
> +dt_htab_next(const dt_htab_t *htab, dt_htab_next_t **it)
> +{
> +	dt_htab_next_t *i = *it;
> +	/*
> +	 * Start of iteration.
> +	 */

Insert a blank line after the declaration and before the block comment.

> +	if (!i) {
> +		if ((i = malloc(sizeof(dt_htab_next_t))) == NULL)
> +			return NULL;
> +		i->htab = htab;
> +		i->idx = 0;
> +		i->bucket = NULL;
> +		i->ent = NULL;
> +		*it = i;
> +	}

At this point, you could say "htab = i->htab" and change the remaining 
i->htab dereferences in the rest of the function to simply htab, if you 
want.

> +
> +	/*
> +	 * One iteration cycle.  Exhaust the current idx: moving on to the next
> +	 * restarts the bucket and ent subloops.
> +	 */
> +	for (; i->idx < i->htab->size; i->idx++, i->bucket = NULL, i->ent = NULL) {
> +
> +		if (i->htab->tab[i->idx] == NULL)
> +			continue;
> +		if (i->bucket == NULL)
> +			i->bucket = i->htab->tab[i->idx];
> +
> +		/*
> +		 * Check the current bucket: moving on to the next restarts the
> +		 * ent subloop.
> +		 */
> +		for (; i->bucket; i->bucket = i->bucket->next, i->ent = NULL) {
> +
> +			if (i->ent == NULL)
> +				i->ent = i->bucket->head;
> +			else
> +				i->ent = i->htab->ops->next(i->ent);
> +			if (i->ent)
> +				return i->ent;
> +		}
> +	}
> +
> +	free(i);
> +	*it = NULL;
> +	return NULL;
> +}
> +