ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-8657) Simultaneous start of bunch of client nodes may lead to some clients hangs
Date Thu, 07 Jun 2018 17:02:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504922#comment-16504922
] 

Alexey Goncharuk commented on IGNITE-8657:
------------------------------------------

[~sergey-chugunov], I think I've found an issue in the tests:
Take a look at the latest run of Binary Objects (Simple Mapper Basic) https://ci.ignite.apache.org/viewLog.html?buildId=1367214&buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperBasic&tab=buildResultsDiv

I see the following assertion in the log
{code}
[16:30:59]W:		 [org.apache.ignite:ignite-core] java.lang.AssertionError: TcpDiscoveryNode
[id=d089379e-11db-453f-99a0-a270bc200002, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502],
discPort=47502, order=341, intOrder=172, lastExchangeTime=1528378258963, loc=false, ver=2.6.0#20180607-sha1:8f8efe4f,
isClient=false]
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.IgniteNeedReconnectException.<init>(IgniteNeedReconnectException.java:38)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.forceClientReconnect(GridDhtPartitionsExchangeFuture.java:2051)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1569)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:138)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:345)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:325)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2837)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2816)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[16:30:59]W:		 [org.apache.ignite:ignite-core] 	at java.lang.Thread.run(Thread.java:745)
{code}

Looks like the exception may be deserialized on a non-client node, so the assertion should
be removed and properly handled on receive.

> Simultaneous start of bunch of client nodes may lead to some clients hangs
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-8657
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8657
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.5
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.6
>
>
> h3. Description
> PartitionExchangeManager uses a system property *IGNITE_EXCHANGE_HISTORY_SIZE* to manage
max number of exchange objects and optimize memory consumption.
> Default value of the property is 1000 but in scenarios with many caches and partitions
it is reasonable to set exchange history size to a smaller values around few dozens.
> Then if user starts up at once more client nodes than history size some clients may hang
because their exchange information was preempted and no longer available.
> h3. Workarounds
> Two workarounds are possible: 
> * Do not start at once more clients than history size.
> * Restart hanging client node.
> h3. Solution
> Forcing client node to reconnect when server detected loosing its exchange information
prevents client nodes hanging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message