Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7BF68200C08 for ; Thu, 26 Jan 2017 21:04:38 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7A925160B31; Thu, 26 Jan 2017 20:04:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BE38C160B4C for ; Thu, 26 Jan 2017 21:04:37 +0100 (CET) Received: (qmail 41667 invoked by uid 500); 26 Jan 2017 20:04:37 -0000 Mailing-List: contact commits-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@helix.apache.org Delivered-To: mailing list commits@helix.apache.org Received: (qmail 41658 invoked by uid 99); 26 Jan 2017 20:04:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jan 2017 20:04:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6C891C0005 for ; Thu, 26 Jan 2017 20:04:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 51CcoZt40HJg for ; Thu, 26 Jan 2017 20:04:32 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id C1EF85F47A for ; Thu, 26 Jan 2017 20:04:31 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1029EE040A for ; Thu, 26 Jan 2017 20:04:25 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 71EDD2528F for ; Thu, 26 Jan 2017 20:04:24 +0000 (UTC) Date: Thu, 26 Jan 2017 20:04:24 +0000 (UTC) From: "subramanian raghunathan (JIRA)" To: commits@helix.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HELIX-652) Double assignment , when participant is not able to establish connection with zookeeper quorum MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 26 Jan 2017 20:04:38 -0000 [ https://issues.apache.org/jira/browse/HELIX-652?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15840= 342#comment-15840342 ]=20 subramanian raghunathan commented on HELIX-652: ----------------------------------------------- Thoughts/Inputs from Kishore: Helix can handle this and probably should. Couple of challenges here are 1.=09How to generalize this across all use cases. This is a trade-off betwe= en availability and ensuring there is only one leader per partition.=20 2.=09There is a pathological case where all zookeeper nodes get partitioned= /crash/GC. In this case, we will make all participants disconnect and assum= e they don't own the partition. But when zookeepers come out of GC, it can = continue as if nothing happened i.e it does not account for the time when i= ts down. I can't think of a good solution for this scenario. Moreover, we c= annot differentiate between a participant GC'ing/partitioned v/s ZK ensembl= e crash/partition/GC. This is typically avoided by ensuring ZK servers are = deployed on different racks. Having said that, I think implementing a config based solution is worth it.= =20 > Double assignment , when participant is not able to establish connection = with zookeeper quorum > -------------------------------------------------------------------------= --------------------- > > Key: HELIX-652 > URL: https://issues.apache.org/jira/browse/HELIX-652 > Project: Apache Helix > Issue Type: Bug > Components: helix-core > Affects Versions: 0.7.1, 0.6.4 > Reporter: subramanian raghunathan > > Double assignment , when participant is not able to establish connection = with zookeeper quorum=20 > =20 > Following is the set up.=20 > Version(s) : Helix: 0.7.1 > Zookeeper:3.3.4 > =20 > - State Model: OnlineOffline=20 > - Controller (leader elected from one of the cluster nodes) > - Single resources with partitions. > - Full auto rebalancer > =20 > -Zookeeper quorum (3 nodes) > =20 > When one participant loses the zookeeper connection (It=E2=80=99s not abl= e to connect to any of the zookeepers , a typical occurrence we faced was s= witch failure from that rack or a network switch failure on a node)=20 > =20 > ---- > The partition (P1) for which this participant (say Node N1) is = online is still maintained > =20 > Meanwhile since it loses the ephemeral node in zookeeper , the rebalance= r gets triggered and it reallocates the partition (P1) to another participa= nt node (say Node N2) to become online @ time T1 > =20 > ---- > After this both N1 and N2 are acting as online fo= r the same Partition (P1)=20 > =20 > But as soon as participant in (say Node N1) is able to re-establish the z= ookeeper connection @ time T2 > ---- > Reset gets called on the partition in participant= (say Node N1)=20 > =20 > Double assignment:=20 > The question here is this an expected behavior that both nodes N1 and N2 = could be online for the same Partition (P1) between time (T1-T2)=20 -- This message was sent by Atlassian JIRA (v6.3.4#6332)