[DTrace-devel] [PATCH 3/7] htab: add dt_htab_insert_unique

Thu Oct 7 20:16:30 PDT 2021

Reviewed-by: Eugene Loh <eugene.loh at oracle.com>

Incidentally, an alternative to introducing new_bucket is, instead of 
setting new_bucket=1, just set no_dups=0.  That is, the "no dups" flag
becomes irrelevant if you're introducing a new bucket anyhow.  Or, 
branch to "add:" only if no_dups==0.  It just seems weird to be using 
both new_bucket and no_dups.  But not a big deal.  So ignore these 
suggestions if they strike you as worse than the problem.

Also, though this is "unrelated" to the patch, I was wondering if you 
could explain the pre-existing code.  We enter the insert function.  We 
look for a match.  If none, we'll add a bucket and therefore check the 
htab size.  If needed, we resize the htab.  At that point, we "retry."  
Why?  If we failed to find a bucket the first time, why should we find 
one now?  Or, are we rerunning the search merely for the side effect of 
recomputing idx?

On 10/5/21 9:38 AM, Nick Alcock wrote:
> Some uses of htabs want to insert items only if they don't already exist
> (duplicates being rather hard to fish out of the htab again).  This is
> doable by doing a lookup first, but when inserting large numbers of
> items, this is a bit of a waste of time, given that insertion already
> knows whether an item matching the key already exists (because it has
> to create a new bucket otherwise).
>
> So split dt_htab_insert into an internal function that can insert dups
> or not, then introduce a new dt_htab_insert_unique that calls it.
>
> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> ---
>   libdtrace/dt_htab.c | 26 ++++++++++++++++++++++++--
>   libdtrace/dt_htab.h |  1 +
>   2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/libdtrace/dt_htab.c b/libdtrace/dt_htab.c
> index 8e99987d06f1..ff80738f17e6 100644
> --- a/libdtrace/dt_htab.c
> +++ b/libdtrace/dt_htab.c
> @@ -119,12 +119,14 @@ static int resize(dt_htab_t *htab)
>   
>   /*
>    * Add an entry to the hashtable.  Resize if necessary, and allocate a new
> - * bucket if necessary.
> + * bucket if necessary.  Optionally skip insertion if this bucket (-> key)
> + * is already present.
>    */
> -int dt_htab_insert(dt_htab_t *htab, void *entry)
> +static int dt_htab_insert_internal(dt_htab_t *htab, void *entry, int no_dups)
>   {
>   	uint32_t	hval = htab->ops->hval(entry);
>   	int		idx;
> +	int		new_bucket = 0;
>   	dt_hbucket_t	*bucket;
>   
>   retry:
> @@ -144,6 +146,7 @@ retry:
>   		goto retry;
>   	}
>   
> +	new_bucket = 1;
>   	bucket = malloc(sizeof(dt_hbucket_t));
>   	if (!bucket)
>   		return -ENOMEM;
> @@ -156,12 +159,31 @@ retry:
>   	htab->nbuckets++;
>   
>   add:
> +	if (!new_bucket && no_dups)
> +		return 0;
> +
>   	bucket->head = htab->ops->add(bucket->head, entry);
>   	bucket->nentries++;
>   
>   	return 0;
>   }
>   
> +/*
> + * Add an entry to the hashtable.
> + */
> +int dt_htab_insert(dt_htab_t *htab, void *entry)
> +{
> +	return dt_htab_insert_internal(htab, entry, 0);
> +}
> +
> +/*
> + * Add an entry to the hashtable, adding only if not already present.
> + */
> +int dt_htab_insert_unique(dt_htab_t *htab, void *entry)
> +{
> +	return dt_htab_insert_internal(htab, entry, 1);
> +}
> +
>   /*
>    * Find an entry in the hashtable.
>    */
> diff --git a/libdtrace/dt_htab.h b/libdtrace/dt_htab.h
> index 3a716357362f..9da72163a9dd 100644
> --- a/libdtrace/dt_htab.h
> +++ b/libdtrace/dt_htab.h
> @@ -87,6 +87,7 @@ typedef struct dt_htab	dt_htab_t;
>   extern dt_htab_t *dt_htab_create(struct dtrace_hdl *dtp, dt_htab_ops_t *ops);
>   extern void dt_htab_destroy(struct dtrace_hdl *dtp, dt_htab_t *htab);
>   extern int dt_htab_insert(dt_htab_t *htab, void *entry);
> +extern int dt_htab_insert_unique(dt_htab_t *htab, void *entry);
>   extern void *dt_htab_lookup(const dt_htab_t *htab, const void *entry);
>   extern int dt_htab_delete(dt_htab_t *htab, void *entry);
>   extern void dt_htab_stats(const char *name, const dt_htab_t *htab);