From dev-return-45423-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Mon Mar 25 05:38:06 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id DC7B518064C for ; Mon, 25 Mar 2019 06:38:05 +0100 (CET) Received: (qmail 66362 invoked by uid 500); 25 Mar 2019 05:38:04 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 66350 invoked by uid 99); 25 Mar 2019 05:38:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2019 05:38:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E46D4180F04 for ; Mon, 25 Mar 2019 05:38:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id zT-HbC-f2y0U for ; Mon, 25 Mar 2019 05:37:59 +0000 (UTC) Received: from sonic305-20.consmr.mail.ne1.yahoo.com (sonic305-20.consmr.mail.ne1.yahoo.com [66.163.185.146]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 24320624C8 for ; Mon, 25 Mar 2019 05:37:59 +0000 (UTC) X-YMail-OSG: .eiAEI8VM1kfdW3Yut1PF7tt0dyhNCumNFHlZ8gF5Vwk8Yh5WXCULzoAAKpf35c xBkzu.l_.x5HO6vsoJUtUPYV_BsHpc68O6jnHRNt2932B9Z2kWrxCg_7n60iiLOxXcJdqTENpz4R D_gGFYJnEn9Pmi6hUX6oyzgRmPiX.kpzyvE7k3MeSuizXMO8Cnsqcx3MvQnEUzBtpT6eGHO3Sixj RBi1OJLRMWz_Z08rQSdUDRSaPlTraNqT5QxklawIXIELkFQE5xY1sGUWt5K1H3jXnYL0pUMPEcRM rdP4ZBXV.om1xEljepJkjbXX7YMpkLycbCvc_QzC8A6GUtDb0BGRgLyEOj9hjBPeqMJbFjC4iVJG WvF7Ljj8kJOK_f0ef_nfV97WJR1qgrvFVmVANEU1APD0GbByLMeSphU4cUzA2mNS304YfsOnzFmd 2zVYeNS_6VJHr12U96zVq81pJM8Tqn4.52KC8Yyid76lLo9gEILLUfmGudZbxiad22_ch7ivNZxq SFS5_RL0xyoIYhVSDoLd0VdRwGLrjKRyFBJYIpmUfC8sWgfXuS9pvqEP49MBOhAf3Y8CHmvNU36M 3B01Y0pjKLv3ZCqUIVKYiSZBkYOm7ucwCW4zuv1sPPARai0puAkIxLgWJnrfFv49Qr6zdIZmM6i3 mnIGB.IjgV16cSzrdHFCwAsvw2HZQ7p63eLZARApd06j1ogCgyRWiaUFsDkqfOR2FuvlASKEBs0b plf8ExsQA7n8f7Hpi_RV5rhvBzo9W__dGc9a_8B99yWM0iuZckceY4BKXpCeLVGihDjxuaI..DjA PnYWUKEyXCSl5XQmvy_doeW.GZeVFuf3ttaZ34NSCh3849gj0nI6CBnSvM6kFCwyGaCzoAgaSFoD pelNwz3.4BzVC1aDjOZXAhZgeC9MHuv3JjR93urDcJZ9FvMZSpMnAvAtheog6VFIGdHyLrCwdDEI nyJQjnxnVsNd5URKgDnePGBesTsdts.iKjxoejQ3ACsYXNNKJ0pZYBBPfjjXRZBP8UdqJ_60NjdS cnmAm4gAhwfFA0MHexIRQu9iIgmjRNaN8_SbgjpQUNUl5.D3sOBRCxwLVvLpIwA-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.ne1.yahoo.com with HTTP; Mon, 25 Mar 2019 05:37:57 +0000 Date: Mon, 25 Mar 2019 05:35:18 +0000 (UTC) From: Roman Shtykh Reply-To: Roman Shtykh To: Dev Message-ID: <620169296.10811442.1553492118403@mail.yahoo.com> Subject: GridDhtInvalidPartitionException takes the cluster down MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_10811441_1884766726.1553492118402" References: <620169296.10811442.1553492118403.ref@mail.yahoo.com> X-Mailer: WebService/1.1.13212 YMailNorrin Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0 ------=_Part_10811441_1884766726.1553492118402 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Igniters, Restarting a node when injecting data and having it expired, results at GridDhtInvalidPartitionException which terminates nodes with SYSTEM_WORKER_TERMINATION one by one taking the whole cluster down. This is really bad and I didn't find the way to save the cluster from disappearing. I created a JIRA issue https://issues.apache.org/jira/browse/IGNITE-11620 with a test case. Any clues how to fix this inconsistency when rebalancing? -- Roman ------=_Part_10811441_1884766726.1553492118402--