ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Griggs (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (IGNITE-4828) Improve the distribution of keys within partitions
Date Thu, 16 Mar 2017 11:48:41 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Michael Griggs reassigned IGNITE-4828:

    Assignee: Michael Griggs

> Improve the distribution of keys within partitions
> --------------------------------------------------
>                 Key: IGNITE-4828
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4828
>             Project: Ignite
>          Issue Type: Improvement
>    Affects Versions: 1.9
>            Reporter: Michael Griggs
>            Assignee: Michael Griggs
>             Fix For: 2.0
> An issue has been found when inserting several million string keys in to a cache.  Each
string key was approximately 22-characters in length.  When I exported the partition counts
(via GG Visor) I was able to see an unusual periodicity in the number of keys allocated to
partitions.  I charted this in Excel (1).
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number ofpartitions
will cause periodicity to appear with a lower number of keys.
> The {{RendezvousAffinityFunction#partition}} function performs a simple
> calculation of key hashcode modulo partition-count:
> {{U.safeAbs(key.hashCode() % parts)}}
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave (2), but was upgraded around Java 1.4 to perform the
> following:
> {{key.hashCode() & (parts - 1)}}
> which performs more efficiently.  It was then updated further to do the
> following:
> {{(h = key.hashCode()) ^ (h >>> 16);}}
> with the bit-shift performed to
> bq. incorporate impact of the highest bits that would otherwise never be used in index
calculations because of table bounds
> When using this function, rather than our
> {{RendezvousAffinityFunction#partition}} implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions (3). 
> (1):  https://i.imgur.com/0FtCZ2A.png
> (2):  https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> (3):  https://i.imgur.com/8ZuCSA3.png

This message was sent by Atlassian JIRA

View raw message