From dev-return-10242-archive-asf-public=cust-asf.ponee.io@curator.apache.org Tue May 28 22:26:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2436218067E for ; Wed, 29 May 2019 00:26:02 +0200 (CEST) Received: (qmail 2887 invoked by uid 500); 28 May 2019 22:26:01 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 2876 invoked by uid 99); 28 May 2019 22:26:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 May 2019 22:26:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 71F29E2CAB for ; Tue, 28 May 2019 22:26:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 288BC24599 for ; Tue, 28 May 2019 22:26:00 +0000 (UTC) Date: Tue, 28 May 2019 22:26:00 +0000 (UTC) From: "Cameron McKenzie (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-525) There is a race condition in Curator which might lead to fake SUSPENDED event and ruin CuratorFrameworkImpl inner state MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-525?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D168= 50200#comment-16850200 ]=20 Cameron McKenzie commented on CURATOR-525: ------------------------------------------ [~mikhailvaliev], do you happen to have a self contained test that can repr= oduce this issue? > There is a race condition in Curator which might lead to fake SUSPENDED e= vent and ruin CuratorFrameworkImpl inner state=20 > -------------------------------------------------------------------------= ----------------------------------------------- > > Key: CURATOR-525 > URL: https://issues.apache.org/jira/browse/CURATOR-525 > Project: Apache Curator > Issue Type: Bug > Components: Framework > Affects Versions: 4.2.0 > Reporter: Mikhail Valiev > Priority: Critical > Attachments: CuratorFrameworkTest.java, background-thread-infinit= e-loop.png, curator-race-condition.png, event-watcher-thread.png > > > This was originally found in the=C2=A02.11.1=C2=A0version of Curator, but= I tested the latest release as well, and the issue is still there. > The issue is tied to guaranteed deletes and how it loops infinitely, if c= alled when there is no connection: > client.delete().guaranteed().forPath(ourPath);=C2=A0 > [https://curator.apache.org/apidocs/org/apache/curator/framework/api/Guar= anteeableDeletable.html] > This schedules a background operation which=C2=A0attempts to remove the n= ode in infinite loop. Each time a background operation fails due to connect= ion loss it performs=C2=A0a check (validateConnection() function) to see if= the main thread is already aware of connection loss, and if it's not - rai= ses the connection loss event. The problem is that this peace of code is al= so executed by the=C2=A0event watcher=C2=A0thread when connection events ar= e happening - this leads to race condition. So when connection is restored = it's easily possible for the main thread to raise RECONNECTED event and aft= er that for background=C2=A0thread=C2=A0to raise SUSPENDED event. > We might get unlucky and get a "phantom"=C2=A0SUSPENDED event.=C2=A0It br= eaks Curator inner Connection state and=C2=A0leads to=C2=A0curator behaving= unpredictably > Attached some illustrations and Unit test to reproduce the issue. (Put de= bug point in validateConnection() ) > *Possible solution*: in CuratorFrameworkImpl class adjust the processEven= t() function and add the following: > if(event.getType() =3D=3D CuratorEventType.SYNC) { > connectionStateManager.addStateChange(ConnectionState.RECONNECTED); > } > If this is a same state as before - it will be ignored, if background ope= ration succeeded, but we are in SUSPENDED state - this would repair the Cur= ator state and raise RECONNECTED event. > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)