From dev-return-38628-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Sep 4 11:59:31 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 664E9180629 for ; Tue, 4 Sep 2018 11:59:31 +0200 (CEST) Received: (qmail 20535 invoked by uid 500); 4 Sep 2018 09:59:30 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 20522 invoked by uid 99); 4 Sep 2018 09:59:30 -0000 Received: from mail-relay.apache.org (HELO mailrelay2-lw-us.apache.org) (207.244.88.137) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2018 09:59:30 +0000 Received: from mail-it0-f48.google.com (mail-it0-f48.google.com [209.85.214.48]) by mailrelay2-lw-us.apache.org (ASF Mail Server at mailrelay2-lw-us.apache.org) with ESMTPSA id 67E562078 for ; Tue, 4 Sep 2018 09:59:29 +0000 (UTC) Received: by mail-it0-f48.google.com with SMTP id p129-v6so4039940ite.3 for ; Tue, 04 Sep 2018 02:59:29 -0700 (PDT) X-Gm-Message-State: APzg51BlDxkGTlcZjTIenCR7OmDRDe13/9hyQNgS7OeCn4OoHZnoV0l+ l3pdUjXwfUYQArqU2xUy0sGuIC0T04Xc16RY3Z0= X-Google-Smtp-Source: ANB0VdZTcPtc2msvWfXIvgpFao5Ronrgx+mJJNqnzsckUQ5K/gJ47UVpN4w+9eYO6VdxbdQybgKD+fevncHmiBdH8hk= X-Received: by 2002:a24:cb02:: with SMTP id u2-v6mr7696770itg.107.1536055168753; Tue, 04 Sep 2018 02:59:28 -0700 (PDT) MIME-Version: 1.0 References: <1910133701.747202.1534753149766.ref@mail.yahoo.com> <1910133701.747202.1534753149766@mail.yahoo.com> <37067773.1257232.1534812547997@mail.yahoo.com> <746803371.702859.1536054380883@mail.yahoo.com> In-Reply-To: From: Anton Vinogradov Date: Tue, 4 Sep 2018 12:59:19 +0300 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Unknown known issue on cache rebalancing delayed To: dev@ignite.apache.org Cc: rshtykh@yahoo.com Content-Type: multipart/alternative; boundary="000000000000b06305057508b63d" --000000000000b06305057508b63d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Maxim, 20 is not 1k :) Also, you forgot to check GridCacheRebalancingAsyncSelfTest I'm not sure we should have exactly 1k runs, but 20 is definitely not enough. Roman, I propose to use IDEA "run until failure" feature and perform test locally (at your PC) while you're not using PC. =D0=B2=D1=82, 4 =D1=81=D0=B5=D0=BD=D1=82. 2018 =D0=B3. =D0=B2 12:51, Maxim = Muzafarov : > Roman, Anton, > > I've already created additional PR [2] all and run it on TC [1]. > Please, follow up with the results. > > [1] > > https://ci.ignite.apache.org/viewType.html?buildTypeId=3DIgniteTests24Jav= a8_Cache8&tab=3DbuildTypeStatusDiv&branch_IgniteTests24Java8=3Dpull%2F4676%= 2Fhead > [2] https://github.com/apache/ignite/pull/4676/files > > > On Tue, 4 Sep 2018 at 12:46 Roman Shtykh > wrote: > > > Anton, > > Thank you. I would like to recheck it. How can this (1_000 runs) be don= e > > in TC? > > > > > > On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton Vinogradov= < > > av@apache.org> wrote: > > > > Roman, > > > > I see you uncommented this line. > > I do not remember deadlock detail, but I remember it was the extremely > rare > > case. > > I found and "fixed" it some days before merge when I had 24x7 sanity > check > > week :) > > > > So, I propose to have at least 1_000 runs of this tests before keeping > this > > uncommented. > > > > > > > > =D0=B2=D1=82, 21 =D0=B0=D0=B2=D0=B3. 2018 =D0=B3. =D0=B2 11:08, Maxim M= uzafarov : > > > > > Roman, > > > > > > I worked recently on rebalance improvements and haven't found any > > problems > > > with delayed cache rebalacne. > > > Agree with you - let's uncomment this and remove scary comment. Will > you > > > create a ticket for it? > > > > > > In case of any problems we can easily detec deadlock with newly > > configured > > > `FailureHandler`. > > > > > > On Tue, 21 Aug 2018 at 03:49 Roman Shtykh wrote: > > > > > > > Hi Maxim, > > > > > > > > I have some issues with a cluster with rebalance delay enabled, but > > need > > > > to check more -- if I find it's related I'll share. > > > > Just wanted to make sure it's not an issue anymore from someone > working > > > on > > > > rebalancing. We should remove that comment then, it looks scary :) > > > > > > > > -- > > > > Roman Shtykh > > > > > > > > > > > > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov < > > > > maxmuzaf@gmail.com> wrote: > > > > > > > > > > > > Hello Roman, > > > > > > > > Did you faced with real issue of delayed rebalance or it's just onl= y > > for > > > > your personal interest? > > > > If yes, please, share details and we will try to help you. > > > > > > > > As for this comment I don't think he is actual. That change was in > > 2015. > > > > Much has changed > > > > within rebalance process since that time. I've uncommented it and > > > > rechecked with that > > > > cache configuration and haven't seen any failed tests or issues. > > > > > > > > Probably, that problem was about cache in SYNC mode does not start > util > > > it > > > > loads all data > > > > from other nodes. But currently delayed rebalance works the same wa= y > as > > > > IgniteCache#rebalance(), > > > > so you can `setRebalanceDelay` to `-1` and call it manually to chec= k. > > > > > > > > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh > > > > > wrote: > > > > > > > > Igniters, > > > > I have found "Known issue, possible deadlock in case of low priorit= y > > > cache > > > > rebalancing delayed" comment in > > > > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please > > explain > > > > when using rebalance delay can be an issue and why? > > > > > > > > -- Roman > > > > > > > > -- > > > > -- > > > > Maxim Muzafarov > > > > > > > -- > > > -- > > > Maxim Muzafarov > > > > > -- > -- > Maxim Muzafarov > --000000000000b06305057508b63d--