Mailing-List: contact user-help@ignite.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@ignite.apache.org
MIME-Version: 1.0
From: Sergii Tyshlek <styshlek@llnw.com>
Date: Tue, 16 Aug 2016 12:54:10 +0300
Message-ID: <CAN2z47py2DSY+=Z3ebxr9_xqyB2Qkjimd+ezyTh+k59hdzeksA@mail.gmail.com>
Subject: Rebalance is skipped
To: user@ignite.apache.org
Content-Type: multipart/alternative; boundary=001a113db64e97ad0d053a2d5458
archived-at: Tue, 16 Aug 2016 09:54:21 -0000

--001a113db64e97ad0d053a2d5458
Content-Type: text/plain; charset=UTF-8

Hello there!

Some time ago we started moving from old GridGain to current Apache Ignite
(1.6, now 1.7).

Here are some cache config properties we use:

--------------------------------------------
cacheMode=PARTITIONED
atomicityMode=ATOMIC
atomicWriteOrderMode=PRIMARY
writeSynchronizationMode=PRIMARY_SYNC

rebalanceMode=ASYNC
rebalanceBatchesPrefetchCount=2
rebalanceDelay=30000
rebalanceTimeout=10000

backups=1
affinity=FairAffinityFunction
affinity.partitions=1024
--------------------------------------------

At first, everything looked OK, but then I noticed that our data is not
distributed evenly between nodes (despite the fact we use
FairAffinityFunction, coordinator node hoards most of the cache entries).
Later I discovered (through wrapping my custom class around
FairAffinityFunction) that it works as expected, but the rebalancing is not.

Short after starting 6 nodes (after the last one joins the topology), such
debug logs appear:

---------------------------------------------------------
2016-08-15 07:36:43,070 DEBUG [exchange-worker-#318%EQGrid%]
[preloader.GridDhtPreloader] - <p_queryResults> Skipping partition
assignment (state is not MOVING): GridDhtLocalPartition [id=0,
map=org.apache.ignite.internal.processors.cache.GridCacheCon
currentMapImpl@5c35248d, rmvQueue=GridCircularBuffer [sizeMask=255,
idxGen=0], cntr=0, state=OWNING, reservations=0, empty=true,
createTime=08/15/2016 07:36:33]
...
// repeats total of 1024 times, where id=0..1023, one for every partition
// then it's followed by
2016-08-15 07:36:43,177 DEBUG [exchange-worker-#318%EQGrid%]
[preloader.GridDhtPartitionDemander] - <p_queryResults> Adding partition
assignments: GridDhtPreloaderAssignments [topVer=AffinityTopologyVersion
[topVer=6, minorTopVer=0], cancelled=false, exchId=GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], nodeId=383991fb,
evt=NODE_JOINED], super={}]
2016-08-15 07:36:43,177 DEBUG [exchange-worker-#318%EQGrid%]
[preloader.GridDhtPartitionDemander] - <p_queryResults> Rebalancing is not
required [cache=p_queryResults, topology=AffinityTopologyVersion [topVer=6,
minorTopVer=0]]
2016-08-15 07:36:43,178 DEBUG [exchange-worker-#318%EQGrid%]
[preloader.GridDhtPartitionDemander] - <p_queryResults> Completed rebalance
future: RebalanceFuture [sndStoppedEvnt=false,
topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=6]
2016-08-15 07:36:43,179 INFO [exchange-worker-#318%EQGrid%]
[cache.GridCachePartitionExchangeManager] - Skipping rebalancing (nothing
scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=0],
evt=NODE_JOINED, node=383991fb-5453-4893-9040-1baa1291881a]
---------------------------------------------------------

So I started digging. Using GridDhtPartitionTopology, I got partitions map,
which (aggregated) looked like this:
---------------------------------------------------------
Node: 38ae4165-474d-4ed4-a292-cca78b8df5c3, partitions: {MOVING=340}
Node: 8ac8d327-dc59-473f-a3e1-c5861f63f0e6, partitions: {MOVING=341}
Node: c7047158-9e7b-494f-bceb-3a5774853a6c, partitions: {MOVING=342}
Node: c9cc1a1f-f037-43c8-8855-0f1ccb8f0ec5, partitions: {MOVING=342}
Node: dce874ff-cc1e-41c8-9e82-abfb3dfa535e, partitions: {OWNING=1024}
Node: de783f6d-dc48-46b8-a387-91dd3d181150, partitions: {MOVING=342}
---------------------------------------------------------

Important point is that such distribution never changes, neither right
after grid start, nor after few hours. Ingesting (or not ingesting) data
also doesn't seem to affect this. Changing rebalanceDelay and commenting
out affinityMapper also made no difference.
From what I'm seeing, affinity function distributes partitions evenly (6
nodes, ~341 partitions each = 2048, i.e. 1024 partitions and a backup), but
the coordinator node just never releases 1024-341=683 partitions, being an
owner of every partition in a grid.

Please, help me understand what might cause such behavior. I included logs
and properties, which seemed relevant to the issue, but I'll provide more
if needed.

- regards, Sergii

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.


--001a113db64e97ad0d053a2d5458
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello there!
<div><br></div><div>Some time ago we started moving from old GridGain to cu=
rrent Apache Ignite (1.6, now 1.7).</div><div><br></div><div>Here are some =
cache config properties we use:</div><div><br></div><div>------------------=
------------<wbr>--------------</div><div><div>cacheMode=3DPARTITIONED</div=
><div>atomicityMode=3DATOMIC</div><div>atomicWriteOrderMode=3DPRIMARY</div>=
<div>writeSynchronizationMode=3DPRIMA<wbr>RY_SYNC</div><div><br></div><div>=
rebalanceMode=3DASYNC</div><div>rebalanceBatchesPrefetchCount=3D<wbr>2</div=
><div>rebalanceDelay=3D30000</div><div>rebalanceTimeout=3D10000</div><div><=
br></div><div>backups=3D1</div><div>affinity=3DFairAffinityFunction</div><d=
iv>affinity.partitions=3D1024</div></div><div>-----------------------------=
-<wbr>--------------<br></div><div><br></div><div>At first, everything look=
ed OK, but then I noticed that our data is not distributed evenly between n=
odes (despite the fact we use FairAffinityFunction, coordinator node hoards=
 most of the cache entries).</div><div>Later I discovered (through wrapping=
 my custom class around FairAffinityFunction) that it works as expected, bu=
t the rebalancing is not.</div><div><br></div><div>Short after starting 6 n=
odes (after the last one joins the topology), such debug logs appear:<br></=
div><div><br></div><div>------------------------------<wbr>----------------=
-----------<br></div><div>2016-08-15 07:36:43,070 DEBUG [exchange-worker-#3=
18%EQGrid%] [preloader.GridDhtPreloader] - &lt;p_queryResults&gt; Skipping =
partition assignment (state is not MOVING): GridDhtLocalPartition [id=3D0, =
map=3Dorg.apache.ignite.internal<wbr>.processors.cache.GridCacheCon<wbr>cur=
rentMapImpl@5c35248d, rmvQueue=3DGridCircularBuffer [sizeMask=3D255, idxGen=
=3D0], cntr=3D0, state=3DOWNING, reservations=3D0, empty=3Dtrue, createTime=
=3D08/15/2016 07:36:33]<br></div><div>...</div><div>// repeats total of 102=
4 times, where id=3D0..1023, one for every partition</div><div>// then it&#=
39;s followed by</div><div><div>2016-08-15 07:36:43,177 DEBUG [exchange-wor=
ker-#318%EQGrid%] [preloader.GridDhtPartitionDem<wbr>ander] - &lt;p_queryRe=
sults&gt; Adding partition assignments: GridDhtPreloaderAssignments [topVer=
=3DAffinityTopologyVersio<wbr>n [topVer=3D6, minorTopVer=3D0], cancelled=3D=
false, exchId=3DGridDhtPartitionExchang<wbr>eId [topVer=3DAffinityTopologyV=
ersio<wbr>n [topVer=3D6, minorTopVer=3D0], nodeId=3D383991fb, evt=3DNODE_JO=
INED], super=3D{}]</div><div>2016-08-15 07:36:43,177 DEBUG [exchange-worker=
-#318%EQGrid%] [preloader.GridDhtPartitionDem<wbr>ander] - &lt;p_queryResul=
ts&gt; Rebalancing is not required [cache=3Dp_queryResults, topology=3DAffi=
nityTopologyVersi<wbr>on [topVer=3D6, minorTopVer=3D0]]</div><div>2016-08-1=
5 07:36:43,178 DEBUG [exchange-worker-#318%EQGrid%] [preloader.GridDhtParti=
tionDem<wbr>ander] - &lt;p_queryResults&gt; Completed rebalance future: Reb=
alanceFuture [sndStoppedEvnt=3Dfalse, topVer=3DAffinityTopologyVersion [top=
Ver=3D6, minorTopVer=3D0], updateSeq=3D6]</div><div>2016-08-15 07:36:43,179=
 INFO [exchange-worker-#318%EQGrid%] [cache.GridCachePartitionExcha<wbr>nge=
Manager] - Skipping rebalancing (nothing scheduled) [top=3DAffinityTopology=
Version [topVer=3D6, minorTopVer=3D0], evt=3DNODE_JOINED, node=3D383991fb-5=
453-4893-9040-1<wbr>baa1291881a]=C2=A0</div></div><div>--------------------=
----------<wbr>---------------------------<br></div><div><br></div><div>So =
I started digging. Using=C2=A0GridDhtPartitionTopology<wbr>, I got partitio=
ns map, which (aggregated) looked like this:<br>---------------------------=
---<wbr>---------------------------<br><div>Node: 38ae4165-474d-4ed4-a292-c=
ca78b<wbr>8df5c3, partitions: {MOVING=3D340}</div><div>Node: 8ac8d327-dc59-=
473f-a3e1-c5861f<wbr>63f0e6, partitions: {MOVING=3D341}</div><div>Node: c70=
47158-9e7b-494f-bceb-3a5774<wbr>853a6c, partitions: {MOVING=3D342}</div><di=
v>Node: c9cc1a1f-f037-43c8-8855-0f1ccb<wbr>8f0ec5, partitions: {MOVING=3D34=
2}</div><div>Node: dce874ff-cc1e-41c8-9e82-abfb3d<wbr>fa535e, partitions: {=
OWNING=3D1024}</div><div>Node: de783f6d-dc48-46b8-a387-91dd3d<wbr>181150, p=
artitions: {MOVING=3D342}</div></div><div>------------------------------<wb=
r>---------------------------<br></div><div><br></div><div>Important point =
is that such distribution never changes, neither right after grid start, no=
r after few hours. Ingesting (or not ingesting) data also doesn&#39;t seem =
to affect this. Changing rebalanceDelay and commenting out affinityMapper a=
lso made no difference.</div><div>From what I&#39;m seeing, affinity functi=
on distributes partitions evenly (6 nodes, ~341 partitions each =3D 2048, i=
.e. 1024 partitions and a backup), but the coordinator node just never rele=
ases 1024-341=3D683 partitions, being an owner of every partition in a grid=
.<br></div><div><br></div><div>Please, help me understand what might cause =
such behavior. I included logs and properties, which seemed relevant to the=
 issue, but I&#39;ll provide more if needed.</div><div><br></div><div>- reg=
ards, Sergii</div></div>

<br>
<div><font size=3D"2">The information in this message may be confidential. =
=C2=A0It is intended solely for</font></div><div><font size=3D"2">the addre=
ssee(s). =C2=A0If you are not the intended recipient, any disclosure,</font=
></div><div><font size=3D"2">copying or distribution of the message, or any=
 action or omission taken by you</font></div><div><font size=3D"2">in relia=
nce on it, is prohibited and may be unlawful. =C2=A0Please immediately</fon=
t></div><div><font size=3D"2">contact the sender if you have received this =
message in error.</font></div><div style=3D"font-size:1.3em"><br></div>
--001a113db64e97ad0d053a2d5458--