From issues-return-66757-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Thu Jun 7 19:02:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C52DB18067B for ; Thu, 7 Jun 2018 19:02:03 +0200 (CEST) Received: (qmail 34630 invoked by uid 500); 7 Jun 2018 17:02:02 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 34448 invoked by uid 99); 7 Jun 2018 17:02:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jun 2018 17:02:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3ECA2CD59A for ; Thu, 7 Jun 2018 17:02:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id yOQ8QJqaThfI for ; Thu, 7 Jun 2018 17:02:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 21C4F5F545 for ; Thu, 7 Jun 2018 17:02:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 90D72E0234 for ; Thu, 7 Jun 2018 17:02:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 49EBA21097 for ; Thu, 7 Jun 2018 17:02:00 +0000 (UTC) Date: Thu, 7 Jun 2018 17:02:00 +0000 (UTC) From: "Alexey Goncharuk (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-8657) Simultaneous start of bunch of client nodes may lead to some clients hangs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504922#comment-16504922 ] Alexey Goncharuk commented on IGNITE-8657: ------------------------------------------ [~sergey-chugunov], I think I've found an issue in the tests: Take a look at the latest run of Binary Objects (Simple Mapper Basic) https://ci.ignite.apache.org/viewLog.html?buildId=1367214&buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperBasic&tab=buildResultsDiv I see the following assertion in the log {code} [16:30:59]W: [org.apache.ignite:ignite-core] java.lang.AssertionError: TcpDiscoveryNode [id=d089379e-11db-453f-99a0-a270bc200002, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=341, intOrder=172, lastExchangeTime=1528378258963, loc=false, ver=2.6.0#20180607-sha1:8f8efe4f, isClient=false] [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.IgniteNeedReconnectException.(IgniteNeedReconnectException.java:38) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.forceClientReconnect(GridDhtPartitionsExchangeFuture.java:2051) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1569) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:138) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:345) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:325) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2837) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2816) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) [16:30:59]W: [org.apache.ignite:ignite-core] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [16:30:59]W: [org.apache.ignite:ignite-core] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [16:30:59]W: [org.apache.ignite:ignite-core] at java.lang.Thread.run(Thread.java:745) {code} Looks like the exception may be deserialized on a non-client node, so the assertion should be removed and properly handled on receive. > Simultaneous start of bunch of client nodes may lead to some clients hangs > -------------------------------------------------------------------------- > > Key: IGNITE-8657 > URL: https://issues.apache.org/jira/browse/IGNITE-8657 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.5 > Reporter: Sergey Chugunov > Assignee: Sergey Chugunov > Priority: Major > Fix For: 2.6 > > > h3. Description > PartitionExchangeManager uses a system property *IGNITE_EXCHANGE_HISTORY_SIZE* to manage max number of exchange objects and optimize memory consumption. > Default value of the property is 1000 but in scenarios with many caches and partitions it is reasonable to set exchange history size to a smaller values around few dozens. > Then if user starts up at once more client nodes than history size some clients may hang because their exchange information was preempted and no longer available. > h3. Workarounds > Two workarounds are possible: > * Do not start at once more clients than history size. > * Restart hanging client node. > h3. Solution > Forcing client node to reconnect when server detected loosing its exchange information prevents client nodes hanging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)