[DTrace-devel] [PATCH v3 08/11] htab: add an iterator

Tue Nov 2 19:24:16 UTC 2021

You can use my earlier Reviewed-by on this patch.  So I think I'm done 
with all 11 patches.  Nevertheless a few comments as usual.

On 11/1/21 10:29 AM, Nick Alcock wrote:
> This commit defines an iterator in the libctf _next style because it's
> *so much* nicer to use than the function-calling _iter form:
>
> extern void *dt_htab_next(const dt_htab_t *htab, dt_htab_next_t **it);
> extern void dt_htab_next_destroy(dt_htab_next_t *i);
>
> Call dt_htab_next with the htab to iterate over and a dt_htab_next_t
> initialized to NULL. It allocates an iterator, returns new elements from
> the hash until iteration is complete, then frees the iterator and
> re-annuls it, ready for another iteration cycle.

I suppose all of that is true, but it's a little deceptive since the 
call does different subsets of those things in different calls.  How about:

     Call dt_htab_next with the htab to iterate over and a dt_htab_next_t
     initialized to NULL to allocate an iterator and return the first 
element.
     Subsequent calls with the allocated iterator will return further 
elements
     from the hash until iteration is complete, at which time the 
iterator is
     freed and reset to NULL, ready for another iteration cycle.

> If you need to exit
> early, dt_htab_next_destroy frees the iterator for you (it's just a
> free() for now, but see binutils-gdb/libctf/ctf-util.c:ctf_next_destroy
> for how this can change over time if your iterator family can iterate
> over many different things, like the ctf_next_t can.)
>
> There are no restrictions whatsoever on what you can do inside iteration
> (iterate over other things, fork, longjmp out of it, you name it) except
> that you should not delete hash entries that the iterator has not yet
> returned: if you insert new ones, they may or may not be returned by
> this iteration cycle.

Phrases like "whatsoever" and "you name it" suggest one message, while 
"except" etc. suggest another.  How about:

     You can delete any entry already returned this iteration cycle.
     If you insert entries, they may or may not be returned in this
     iteration cycle.

> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> @@ -409,6 +409,7 @@ static dt_htab_ops_t dt_spec_buf_htab_ops = {
>   	.hval = (htab_hval_fn)dt_spec_buf_hval,
>   	.cmp = (htab_cmp_fn)dt_spec_buf_cmp,
>   	.add = (htab_add_fn)dt_spec_buf_add,
> +	.next = (htab_next_fn)dt_spec_buf_next,
>   	.del = (htab_del_fn)dt_spec_buf_del_buf
>   };

Sorry if I muddled my earlier comment on this.  In dt_htab.h, next 
consistently follows del.  (In dt_htab.c, there is similar ordering, 
even if admittedly the details are different.)  Anyhow, stick next 
*after* del in dt_consume.c just for consistency.

> diff --git a/libdtrace/dt_htab.c b/libdtrace/dt_htab.c
> @@ -251,6 +266,83 @@ int dt_htab_delete(dt_htab_t *htab, void *entry)
> +/*
> + * Iterate over a hashtable, returning each element in turn (as a pointer to
> + * void).  NULL is returned at end-of-iteration (and OOM).
> + */
> +void *
> +dt_htab_next(const dt_htab_t *htab, dt_htab_next_t **it)
> +{
> +	dt_htab_next_t *i = *it;
> +	void *ret;
> +
> +	/*
> +	 * Start of iteration.
> +	 */
> +	if (!i) {
> +		if ((i = calloc(1, sizeof(dt_htab_next_t))) == NULL)
> +			return NULL;
> +		i->htab = htab;
> +		*it = i;
> +
> +		/*
> +		 * Tick through one iteration.  Ignore the return value,
> +		 * since it is meaningless at this stage.
> +		 */
> +		(void) dt_htab_next(htab, it);
> +	}
> +
> +	/*
> +	 * Capture the value we will return after advancing the iterator, and
> +	 * handle end-of-iteration.
> +	 */
> +
> +	ret = i->ent;
> +	if (i->exhausted) {
> +		free(i);
> +		*it = NULL;
> +		return NULL;
> +	}

This is okay but how about doing the i->exhausted check first. If it 
fails, then it makes sense to set ret=i->ent.

> +	/*
> +	 * One iteration cycle.  Exhaust the current idx: moving on to the next
> +	 * restarts the bucket and ent subloops.
> +	 */
> +	for (; i->idx < i->htab->size; i->idx++, i->bucket = NULL, i->ent = NULL) {
> +
> +		if (i->htab->tab[i->idx] == NULL)
> +			continue;
> +		if (i->bucket == NULL)
> +			i->bucket = i->htab->tab[i->idx];
> +
> +		/*
> +		 * Check the current bucket: moving on to the next restarts the
> +		 * ent subloop.
> +		 */
> +		for (; i->bucket; i->bucket = i->bucket->next, i->ent = NULL) {
> +
> +			if (i->ent == NULL)
> +				i->ent = i->bucket->head;
> +			else
> +				i->ent = i->htab->ops->next(i->ent);
> +			if (i->ent)
> +				return ret;
> +		}
> +	}
> +
> +	/*
> +	 * End of advancement, but that doesn't mean we can't return.
> +	 */
> +	i->exhausted = 1;
> +	return ret;
> +}
> diff --git a/libdtrace/dt_kernel_module.c b/libdtrace/dt_kernel_module.c
> @@ -58,7 +58,8 @@ static dt_htab_ops_t kernpath_htab_ops = {
>   	.hval = (htab_hval_fn)kernpath_hval,
>   	.cmp = (htab_cmp_fn)kernpath_cmp,
>   	.add = (htab_add_fn)kernpath_add,
> -	.del = (htab_del_fn)kernpath_del_path
> +	.next = (htab_next_fn)kernpath_next,
> +	.del = (htab_del_fn)kernpath_del_path,
>   };
Again, how about maintaining the order (del then next) that is so 
consistently observed in the header file?