Mailing-List: contact commits-help@helix.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@helix.apache.org
Date: Thu, 26 Jan 2017 20:04:24 +0000 (UTC)
From: "subramanian raghunathan (JIRA)" <jira@apache.org>
To: commits@helix.incubator.apache.org
Message-ID: <JIRA.13038242.1485460957000.6461.1485461064465@Atlassian.JIRA>
In-Reply-To: <JIRA.13038242.1485460957000@Atlassian.JIRA>
References: <JIRA.13038242.1485460957000@Atlassian.JIRA> <JIRA.13038242.1485460957789@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HELIX-652) Double assignment , when participant
 is not able to establish connection with zookeeper quorum
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 26 Jan 2017 20:04:38 -0000


    [ https://issues.apache.org/jira/browse/HELIX-652?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15840=
342#comment-15840342 ]=20

subramanian raghunathan commented on HELIX-652:
-----------------------------------------------

Thoughts/Inputs from Kishore:
Helix can handle this and probably should. Couple of challenges here are
1.=09How to generalize this across all use cases. This is a trade-off betwe=
en availability and ensuring there is only one leader per partition.=20
2.=09There is a pathological case where all zookeeper nodes get partitioned=
/crash/GC. In this case, we will make all participants disconnect and assum=
e they don't own the partition. But when zookeepers come out of GC, it can =
continue as if nothing happened i.e it does not account for the time when i=
ts down. I can't think of a good solution for this scenario. Moreover, we c=
annot differentiate between a participant GC'ing/partitioned v/s ZK ensembl=
e crash/partition/GC. This is typically avoided by ensuring ZK servers are =
deployed on different racks.
Having said that, I think implementing a config based solution is worth it.=
=20


> Double assignment , when participant is not able to establish connection =
with zookeeper quorum
> -------------------------------------------------------------------------=
---------------------
>
>                 Key: HELIX-652
>                 URL: https://issues.apache.org/jira/browse/HELIX-652
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.7.1, 0.6.4
>            Reporter: subramanian raghunathan
>
> Double assignment , when participant is not able to establish connection =
with zookeeper quorum=20
> =20
> Following is the  set up.=20
> Version(s) :               Helix: 0.7.1
>                                 Zookeeper:3.3.4
> =20
> - State Model: OnlineOffline=20
> - Controller (leader elected from one of the cluster nodes)
> - Single resources with partitions.
> - Full auto rebalancer
> =20
> -Zookeeper quorum (3 nodes)
> =20
> When one participant loses the zookeeper connection (It=E2=80=99s not abl=
e to connect to any of the zookeepers , a typical occurrence we faced was s=
witch failure from that rack or a network switch failure on a node)=20
> =20
>   ---- >  The partition (P1) for which this participant (say Node N1) is =
online is still maintained
> =20
> Meanwhile since it loses the ephemeral  node in zookeeper , the rebalance=
r gets triggered and it reallocates the partition (P1) to another participa=
nt node (say Node N2) to become online  @ time T1
> =20
>                 ---- >  After this both N1 and N2 are acting as online fo=
r the same Partition (P1)=20
> =20
> But as soon as participant in (say Node N1) is able to re-establish the z=
ookeeper connection  @ time T2
>                 ---- >  Reset gets called on the partition in participant=
 (say Node N1)=20
>                =20
> Double assignment:=20
> The question here is this an expected behavior that both nodes N1 and N2 =
could be online for the same Partition (P1) between time (T1-T2)=20


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)