helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Double assignment , when participant is not able to establish connection with zookeeper quorum
Date Thu, 26 Jan 2017 00:45:22 GMT
After few seconds, the participant N1 gets a disconnect event from ZK. At
this time, schedule a timer task for (30  - X) seconds. 30 is the session
timeout and X can vary from 0 to 30 depending on how long you are ok to not
have a P1 being down.

When the timer task kicks in and N1 is still disconnected from the cluster,
assume that this N1 is no longer the owner of P1.

After 30 seconds, Helix will notice that N1 is network partitioned and will
assign P1 to N2.
This will ensure that there is no overlap.

Will that work for you?


On Wed, Jan 25, 2017 at 4:17 PM, Subramanian Raghunathan <
subramanian.raghunathan@integral.com> wrote:

> Hi ,
>
>
>
> Double assignment , when participant is not able to establish connection
> with zookeeper quorum
>
>
>
> Following is the  set up.
>
>
>
> Version(s) :
>
>                                 Helix: 0.7.1
>
>                                 Zookeeper:3.3.4
>
>
>
> - State Model: OnlineOffline
>
> - Controller (leader elected from one of the cluster nodes)
>
> - Single resources with partitions.
>
> - Full auto rebalancer
>
>
>
> -Zookeeper quorum (3 nodes)
>
>
>
> When one participant loses the zookeeper connection (It’s not able to
> connect to any of the zookeepers , a typical occurrence we faced was switch
> failure from that rack)
>
>
>
>   ---- >  The partition (P1) for which this participant (say Node N1) is
> online is still maintained
>
>
>
> Meanwhile since it loses the ephemeral  node in zookeeper , the rebalancer
> gets triggered and it reallocates the partition (P1) to another participant
> node (say Node N2) to become online  @ time T1
>
>
>
>                 ---- >  *After this both N1 and N2 are acting as online
> for the same Partition (P1) *
>
>
>
> But as soon as participant in (say Node N1) is able to re-establish the
> zookeeper connection  @ time T2
>
>                 ---- >  Reset gets called on the partition in participant
> (say Node N1)
>
>
>
> Double assignment:
>
> The question here is this an expected behavior that both nodes N1 and N2
> could be online for the same Partition (P1) between time (T1-T2) ? Any
> responses on the same would be appreciated.
>
>
>
> Thanks & Regards,
>
> Subramanian.
>
>
>
> 3400 Hillview Ave, Building 4
>
> Palo Alto, CA 94304
>
> www.integral.com
>
> [image: Logo_signature_block]
> <http://www.integral.com/fxcloud_features/risk_management.html#ym>
>
> NOTICE: This e-mail message and any attachments, which may contain
> confidential information, are to be viewed solely by the intended recipient
> of Integral Development Corp. For further information, please visit
> http://www.integral.com/about/disclaimer.html.
>
>
>
>
>
>
>

Mime
View raw message