ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduard Shangareev <eduard.shangar...@gmail.com>
Subject Re: Rebalance is skipped
Date Thu, 18 Aug 2016 13:10:17 GMT
Hi, Sergii.

I can't reproduce this issue. I have written test.

https://github.com/EdShangGG/ignite/commit/8a9462c3a55c6c0317fb8bdc9ef26e12fdbcfb9a

It uses the same configuration as you mentioned.

Could you take a look? Maybe, I am missing something.


On Tue, Aug 16, 2016 at 12:54 PM, Sergii Tyshlek <styshlek@llnw.com> wrote:

> Hello there!
>
> Some time ago we started moving from old GridGain to current Apache Ignite
> (1.6, now 1.7).
>
> Here are some cache config properties we use:
>
> --------------------------------------------
> cacheMode=PARTITIONED
> atomicityMode=ATOMIC
> atomicWriteOrderMode=PRIMARY
> writeSynchronizationMode=PRIMARY_SYNC
>
> rebalanceMode=ASYNC
> rebalanceBatchesPrefetchCount=2
> rebalanceDelay=30000
> rebalanceTimeout=10000
>
> backups=1
> affinity=FairAffinityFunction
> affinity.partitions=1024
> --------------------------------------------
>
> At first, everything looked OK, but then I noticed that our data is not
> distributed evenly between nodes (despite the fact we use
> FairAffinityFunction, coordinator node hoards most of the cache entries).
> Later I discovered (through wrapping my custom class around
> FairAffinityFunction) that it works as expected, but the rebalancing is not.
>
> Short after starting 6 nodes (after the last one joins the topology), such
> debug logs appear:
>
> ---------------------------------------------------------
> 2016-08-15 07:36:43,070 DEBUG [exchange-worker-#318%EQGrid%]
> [preloader.GridDhtPreloader] - <p_queryResults> Skipping partition
> assignment (state is not MOVING): GridDhtLocalPartition [id=0,
> map=org.apache.ignite.internal.processors.cache.GridCacheCon
> currentMapImpl@5c35248d, rmvQueue=GridCircularBuffer [sizeMask=255,
> idxGen=0], cntr=0, state=OWNING, reservations=0, empty=true,
> createTime=08/15/2016 07:36:33]
> ...
> // repeats total of 1024 times, where id=0..1023, one for every partition
> // then it's followed by
> 2016-08-15 07:36:43,177 DEBUG [exchange-worker-#318%EQGrid%]
> [preloader.GridDhtPartitionDemander] - <p_queryResults> Adding partition
> assignments: GridDhtPreloaderAssignments [topVer=AffinityTopologyVersion
> [topVer=6, minorTopVer=0], cancelled=false, exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> nodeId=383991fb, evt=NODE_JOINED], super={}]
> 2016-08-15 07:36:43,177 DEBUG [exchange-worker-#318%EQGrid%]
> [preloader.GridDhtPartitionDemander] - <p_queryResults> Rebalancing is
> not required [cache=p_queryResults, topology=AffinityTopologyVersion
> [topVer=6, minorTopVer=0]]
> 2016-08-15 07:36:43,178 DEBUG [exchange-worker-#318%EQGrid%]
> [preloader.GridDhtPartitionDemander] - <p_queryResults> Completed
> rebalance future: RebalanceFuture [sndStoppedEvnt=false,
> topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=6]
> 2016-08-15 07:36:43,179 INFO [exchange-worker-#318%EQGrid%]
> [cache.GridCachePartitionExchangeManager] - Skipping rebalancing (nothing
> scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> evt=NODE_JOINED, node=383991fb-5453-4893-9040-1baa1291881a]
> ---------------------------------------------------------
>
> So I started digging. Using GridDhtPartitionTopology, I got partitions
> map, which (aggregated) looked like this:
> ---------------------------------------------------------
> Node: 38ae4165-474d-4ed4-a292-cca78b8df5c3, partitions: {MOVING=340}
> Node: 8ac8d327-dc59-473f-a3e1-c5861f63f0e6, partitions: {MOVING=341}
> Node: c7047158-9e7b-494f-bceb-3a5774853a6c, partitions: {MOVING=342}
> Node: c9cc1a1f-f037-43c8-8855-0f1ccb8f0ec5, partitions: {MOVING=342}
> Node: dce874ff-cc1e-41c8-9e82-abfb3dfa535e, partitions: {OWNING=1024}
> Node: de783f6d-dc48-46b8-a387-91dd3d181150, partitions: {MOVING=342}
> ---------------------------------------------------------
>
> Important point is that such distribution never changes, neither right
> after grid start, nor after few hours. Ingesting (or not ingesting) data
> also doesn't seem to affect this. Changing rebalanceDelay and commenting
> out affinityMapper also made no difference.
> From what I'm seeing, affinity function distributes partitions evenly (6
> nodes, ~341 partitions each = 2048, i.e. 1024 partitions and a backup), but
> the coordinator node just never releases 1024-341=683 partitions, being an
> owner of every partition in a grid.
>
> Please, help me understand what might cause such behavior. I included logs
> and properties, which seemed relevant to the issue, but I'll provide more
> if needed.
>
> - regards, Sergii
>
> The information in this message may be confidential.  It is intended
> solely for
> the addressee(s).  If you are not the intended recipient, any disclosure,
> copying or distribution of the message, or any action or omission taken by
> you
> in reliance on it, is prohibited and may be unlawful.  Please immediately
> contact the sender if you have received this message in error.
>
>

Mime
View raw message