ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Kulichenko <valentin.kuliche...@gmail.com>
Subject Re: Distribution of keys to partitions
Date Wed, 15 Mar 2017 15:48:29 GMT
In 99% of cases number of partition is a power of two, because it's the
default value. Almost no one changes it. If this change actually provides
better distribution, it absolutely makes sense to do it.

Michael, can you create a Jira ticket and put you findings there?

-Val

On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <agura@apache.org> wrote:

> Michael,
>
> it makes sense only for cases when partitions count is power of two.
> Affinity function doesn't have this limitation.
>
> Bu, of course, we can check, that partitions count is power of two and
> use optimized hash code calculation.
>
>
> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
> <michael.griggs@gridgain.com> wrote:
> > Hi Igniters,
> >
> > Last week I was working with a group of Ignite users.  They are inserting
> > several million string keys in to a cache.  Each string key was
> > approximately 22-characters in length.  When I exported the partition
> > counts (via GG Visor) I was able to see an unusual periodicity in the
> > number of keys allocated to partitions.  I charted this in Excel [1].
> > After further investigation, it appears that there is a relationship
> > between the number of keys being inserted, the number of partitions
> > assigned to the cache and amount of apparent periodicity: a small number
> of
> > partitions will cause periodicity to appear with lower numbers of keys.
> >
> > The RendezvousAffinityFunction#partition function performs a simple
> > calculation of key hashcode modulo partition-count:
> >
> > U.safeAbs(key.hashCode() % parts)
> >
> >
> > Digging further I was led to the fact that this is how the Java HashMap
> > *used* to behave [2], but was upgraded around Java 1.4 to perform the
> > following:
> >
> > key.hashCode() & (parts - 1)
> >
> > which performs more efficiently.  It was then updated further to do the
> > following:
> >
> > (h = key.hashCode()) ^ (h >>> 16);
> >
> > with the bit-shift performed to
> >
> > incorporate impact of the highest bits that would otherwise
> > never be used in index calculations because of table bounds
> >
> >
> > When using this function, rather than our
> > RendezvousAffinityFunction#partition implementation, I also saw a
> > significant decrease in the periodicity and a better distribution of keys
> > amongst partitions [3].
> >
> > I would like to suggest that we adopt this modified hash function inside
> > RendezvousAffinityFunction.
> >
> > Regards
> > Mike
> >
> >
> > [1]: https://i.imgur.com/0FtCZ2A.png
> > [2]:
> > https://www.quora.com/Why-does-Java-use-a-mediocre-
> hashCode-implementation-for-strings
> > [3]: https://i.imgur.com/8ZuCSA3.png
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message