ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Griggs (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-4828) Improve the distribution of keys within partitions
Date Wed, 15 Mar 2017 16:14:41 GMT
Michael Griggs created IGNITE-4828:
--------------------------------------

             Summary: Improve the distribution of keys within partitions
                 Key: IGNITE-4828
                 URL: https://issues.apache.org/jira/browse/IGNITE-4828
             Project: Ignite
          Issue Type: Improvement
    Affects Versions: 1.9
            Reporter: Michael Griggs
             Fix For: 2.0


An issue has been found when inserting several million string keys in to a cache.  Each string
key was approximately 22-characters in length.  When I exported the partition counts (via
GG Visor) I was able to see an unusual periodicity in the number of keys allocated to partitions.
 I charted this in Excel (1).

After further investigation, it appears that there is a relationship
between the number of keys being inserted, the number of partitions
assigned to the cache and amount of apparent periodicity: a small number ofpartitions will
cause periodicity to appear with a lower number of keys.

The {{RendezvousAffinityFunction#partition}} function performs a simple
calculation of key hashcode modulo partition-count:

{{U.safeAbs(key.hashCode() % parts)}}

Digging further I was led to the fact that this is how the Java HashMap
*used* to behave (2), but was upgraded around Java 1.4 to perform the
following:

{{key.hashCode() & (parts - 1)}}

which performs more efficiently.  It was then updated further to do the
following:

{{(h = key.hashCode()) ^ (h >>> 16);}}

with the bit-shift performed to
bq. incorporate impact of the highest bits that would otherwise never be used in index calculations
because of table bounds

When using this function, rather than our
{{RendezvousAffinityFunction#partition}} implementation, I also saw a
significant decrease in the periodicity and a better distribution of keys
amongst partitions (3). 

(1):  https://i.imgur.com/0FtCZ2A.png
(2):  https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
(3):  https://i.imgur.com/8ZuCSA3.png



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message