From issues-return-120492-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Wed Jun 2 08:57:03 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id C39CA180670 for ; Wed, 2 Jun 2021 10:57:03 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 3A7CE615CF for ; Wed, 2 Jun 2021 08:57:03 +0000 (UTC) Received: (qmail 99158 invoked by uid 500); 2 Jun 2021 08:57:02 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 99068 invoked by uid 99); 2 Jun 2021 08:57:02 -0000 Received: from mailrelay1-he-de.apache.org (HELO mailrelay1-he-de.apache.org) (116.203.21.61) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jun 2021 08:57:02 +0000 Received: from jira2-he-de.apache.org (unknown [IPv6:2a01:4f8:242:1f49::2]) by mailrelay1-he-de.apache.org (ASF Mail Server at mailrelay1-he-de.apache.org) with ESMTPS id C6C893EA56 for ; Wed, 2 Jun 2021 08:57:00 +0000 (UTC) Received: from jira2-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira2-he-de.apache.org (ASF Mail Server at jira2-he-de.apache.org) with ESMTP id 6535CC807D6 for ; Wed, 2 Jun 2021 08:57:00 +0000 (UTC) Date: Wed, 2 Jun 2021 08:57:00 +0000 (UTC) From: "Ignite TC Bot (Jira)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-14474) Improve error message in case rebalance fails MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355582#comment-17355582 ] Ignite TC Bot commented on IGNITE-14474: ---------------------------------------- {panel:title=Branch: [pull/9004/head] Base: [master] : Possible Blockers (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=6032385]] {panel} {panel:title=Branch: [pull/9004/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6025784&buildTypeId=IgniteTests24Java8_RunAll] > Improve error message in case rebalance fails > --------------------------------------------- > > Key: IGNITE-14474 > URL: https://issues.apache.org/jira/browse/IGNITE-14474 > Project: Ignite > Issue Type: Improvement > Affects Versions: 2.5 > Reporter: Denis Chudov > Assignee: Rodion Smolnikov > Priority: Major > Fix For: 2.9.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently we can get a message like this when rebalance fails with an exception (examples from ignite 2.5, in newer versions the log messages were changed but the problem is still actual): > {code:java} > 2019-11-27 13:41:14,504[WARN ][utility-#79%xxx%][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topic=0]. Supply message couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to unmarshal object with optimized marshaller > 2019-11-27 13:41:14,504[INFO ][utility-#79%xxx%][GridDhtPartitionDemander] Cancelled rebalancing [grp=ignite-sys-cache, supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], time=88 ms] > 2019-11-27 13:41:14,508[WARN ][utility-#76%xxx%][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], supplier=dfa5ee06-48c9-4458-ae55-48cc6ceda998, topic=0]. Supply message couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to unmarshal object with optimized marshaller > {code} > In the case above, a marshalling exception leads to rebalance failure which will never be resolved - i.e. the cluster enters into a erroneous state. > We should report issues like this as ERROR. The message should explain that the rebalance has failed, data for the cache was not fully copied to the node, the backup factor is not recovered and the cluster may not work correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)