From issues-return-67070-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Wed Jun 13 18:38:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B64EB180609 for ; Wed, 13 Jun 2018 18:38:05 +0200 (CEST) Received: (qmail 82322 invoked by uid 500); 13 Jun 2018 16:38:04 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 82312 invoked by uid 99); 13 Jun 2018 16:38:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2018 16:38:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 41F2F181794 for ; Wed, 13 Jun 2018 16:38:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.311 X-Spam-Level: X-Spam-Status: No, score=-110.311 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id oDwwgFxwxCCW for ; Wed, 13 Jun 2018 16:38:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 507385F580 for ; Wed, 13 Jun 2018 16:38:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 32E00E0EEE for ; Wed, 13 Jun 2018 16:38:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 810162179F for ; Wed, 13 Jun 2018 16:38:00 +0000 (UTC) Date: Wed, 13 Jun 2018 16:38:00 +0000 (UTC) From: "Pavel Kovalenko (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (IGNITE-8785) Node may hang indefinitely in CONNECTING state during cluster segmentation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Pavel Kovalenko created IGNITE-8785: --------------------------------------- Summary: Node may hang indefinitely in CONNECTING state during cluster segmentation Key: IGNITE-8785 URL: https://issues.apache.org/jira/browse/IGNITE-8785 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.5 Reporter: Pavel Kovalenko Fix For: 2.6 Affected test: org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest#testTopologyValidatorWithCacheGroup Node hangs with following stacktrace: {noformat} "grid-starter-testTopologyValidatorWithCacheGroup-22" #117619 prio=5 os_prio=0 tid=0x00007f17dd19b800 nid=0x304a in Object.wait() [0x00007f16b19df000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:931) - locked <0x0000000705ee4a60> (a java.lang.Object) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:373) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1948) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:915) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1739) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1046) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723) - locked <0x0000000705995ec0> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:882) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:845) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:833) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:799) at org.apache.ignite.testframework.junits.GridAbstractTest$3.call(GridAbstractTest.java:742) at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) {noformat} It seems that node never receives acknowledgment from coordinator. There were some failure before: {noformat} [org.apache.ignite:ignite-core] [2018-06-10 04:59:18,876][WARN ][grid-starter-testTopologyValidatorWithCacheGroup-22][IgniteCacheTopologySplitAbstractTest$SplitTcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)