lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <>
Subject [jira] Commented: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.
Date Tue, 15 Mar 2011 20:42:29 GMT


Dawid Weiss commented on LUCENE-2967:

Yes, now I see this difference on the 38M too:


I'll see if I can find out the problem here; I assume the collision ratio should be nearly
identical... but who knows. This is of no priority, but interesting stuff. I'll close if I
can't get it better than the trunk version.

> Use linear probing with an additional good bit avalanching function in FST's NodeHash.
> --------------------------------------------------------------------------------------
>                 Key: LUCENE-2967
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Trivial
>             Fix For: 4.0
>         Attachments: LUCENE-2967.patch
> I recently had an interesting discussion with Sebastiano Vigna (fastutil), who suggested
that linear probing, given a hash mixing function with good avalanche properties, is a way
better method of constructing lookups in associative arrays compared to quadratic probing.
Indeed, with linear probing you can implement removals from a hash map without removed slot
markers and linear probing has nice properties with respect to modern CPUs (caches). I've
reimplemented HPPC's hash maps to use linear probing and we observed a nice speedup (the same
applies for fastutils of course).
> This patch changes NodeHash's implementation to use linear probing. The code is a bit
simpler (I think :). I also moved the load factor to a constant -- 0.5 seems like a generous
load factor, especially if we allow large FSTs to be built. I don't see any significant speedup
in constructing large automata, but there is no slowdown either (I checked on one machine
only for now, but will verify on other machines too).

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message