ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Muzafarov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
Date Thu, 12 Jul 2018 11:28:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541489#comment-16541489
] 

Maxim Muzafarov edited comment on IGNITE-7165 at 7/12/18 11:27 AM:
-------------------------------------------------------------------

I've checked tests (not related to current change):
* IgnitePdsDynamicCacheTest.testRestartAndCreate	(fail rate 0,0%)
* IgnitePdsCheckpointSimulationWithRealCpDisabledTest.testCheckpointSimulationMultiThreaded
(fail rate 0,0%) 
* GridCachePartitionedDataStructuresFailoverSelfTest.testFairReentrantLockFailsWhenServersLeft
(fail rate 0,0%) 
* CacheStopAndDestroySelfTest.testClientClose	(fail rate 0,0%)
* CacheStopAndDestroySelfTest.testLocalClose	(fail rate 0,0%)
* GridCacheLocalMultithreadedSelfTest.testBasicLocks	(fail rate 0,0%) 
* IgniteClientReconnectFailoverTest.testReconnectStreamerApi	(fail rate 0,0%) 

 


was (Author: mmuzaf):
I've checked tests (not related to current change):
* IgnitePdsDynamicCacheTest.testRestartAndCreate	(fail rate 0,0%)
* IgnitePdsCheckpointSimulationWithRealCpDisabledTest.testCheckpointSimulationMultiThreaded
(fail rate 0,0%) 
* GridCachePartitionedDataStructuresFailoverSelfTest.testFairReentrantLockFailsWhenServersLeft
(fail rate 0,0%) 
* CacheStopAndDestroySelfTest.testClientClose	(fail rate 0,0%)
* CacheStopAndDestroySelfTest.testLocalClose	(fail rate 0,0%)
* GridCacheLocalMultithreadedSelfTest.testBasicLocks	(fail rate 0,0%) 
* GridCacheLocalMultithreadedSelfTest.testBasicLocks	(fail rate 0,0%) 
* IgniteClientReconnectFailoverTest.testReconnectStreamerApi	(fail rate 0,0%) 

 

> Re-balancing is cancelled if client node joins
> ----------------------------------------------
>
>                 Key: IGNITE-7165
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7165
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Cherkasov
>            Assignee: Maxim Muzafarov
>            Priority: Critical
>              Labels: rebalance
>             Fix For: 2.7
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours and each time
when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] Added
new node to topology: TcpDiscoveryNode [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1,
127.0.0.1, 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, /172.31.16.213:0],
discPort=0, order=36, intOrder=24, lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe,
isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] Topology
snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef,
customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], resVer=AffinityTopologyVersion
[topVer=36, minorTopVer=0], err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=36, minorTopVer=0],
evt=NODE_JOINED, node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] Cancelled
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
Rebalancing started [top=null, evt=NODE_JOINED, node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] Starting
rebalancing [mode=ASYNC, fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18,
topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] Starting
rebalancing [mode=ASYNC, fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15,
topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] Starting
rebalancing [mode=ASYNC, fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15,
topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] Starting
rebalancing [mode=ASYNC, fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12,
topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] Starting
rebalancing [mode=ASYNC, fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11,
topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] Starting
rebalancing [mode=ASYNC, fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18,
topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join events this
means that a new server will never receive its partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message