helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Double assignment , when participant is not able to establish connection with zookeeper quorum
Date Thu, 26 Jan 2017 18:43:05 GMT
Can you please file a ticket and we can probably add this in the next
release.

Minor typo in my email.


   1. There is a pathological case where all zookeeper nodes get
   partitioned/crash/GC. In this case, we will make all participants
   disconnect and assume they don't own the partition. But when zookeepers
   come out of GC, it can continue as if nothing happened i.e it does not
   account for the time when its down. I can't think of a good solution for
   this scenario. Moreover, we *cannot* differentiate between a participant
   GC'ing/partitioned v/s ZK ensemble crash/partition/GC. This is typically
   avoided by ensuring ZK servers are deployed on different racks.


On Thu, Jan 26, 2017 at 10:34 AM, Subramanian Raghunathan <
subramanian.raghunathan@integral.com> wrote:

> Totally concur your thought,  a config based approach would be better .It
> could be tuned based on the acceptable tolerance and consistency.
>
>
>
> Thanks,
>
> Subramanian.
>
>
>
> Tel: +1 (650) 424 4655 <(650)%20424-4655>
>
>
>
> 3400 Hillview Ave, Building 4
>
> Palo Alto, CA 94304
>
> www.integral.com
>
> [image: Logo_signature_block]
> <http://www.integral.com/fxcloud_features/risk_management.html#ym>
>
> NOTICE: This e-mail message and any attachments, which may contain
> confidential information, are to be viewed solely by the intended recipient
> of Integral Development Corp. For further information, please visit
> http://www.integral.com/about/disclaimer.html.
>
>
>
>
>
>
>
> *From:* kishore g [mailto:g.kishore@gmail.com]
> *Sent:* Wednesday, January 25, 2017 7:12 PM
>
> *To:* user@helix.apache.org
> *Cc:* dev@helix.incubator.apache.org
> *Subject:* Re: Double assignment , when participant is not able to
> establish connection with zookeeper quorum
>
>
>
> Helix can handle this and probably should. Couple of challenges here are
>
>    1. How to generalize this across all use cases. This is a
>    trade-off between availability and ensuring there is only one leader per
>    partition.
>    2. There is a pathological case where all zookeeper nodes get
>    partitioned/crash/GC. In this case, we will make all participants
>    disconnect and assume they don't own the partition. But when zookeepers
>    come out of GC, it can continue as if nothing happened i.e it does not
>    account for the time when its down. I can't think of a good solution for
>    this scenario. Moreover, we can differentiate between a participant
>    GC'ing/partitioned v/s ZK ensemble crash/partition/GC. This is typically
>    avoided by ensuring ZK servers are deployed on different racks.
>
> Having said that, I think implementing a config based solution is worth
> it.
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Jan 25, 2017 at 4:57 PM, Subramanian Raghunathan <
> subramanian.raghunathan@integral.com> wrote:
>
> Hi Kishore ,
>
>
>
>                 Thank you for the confirmation , yes we had solved it in
> similar lines and it did work for us (listening on the disconnect event
> from ZK).
>
>
>
>                 From the double assignment point of view is it an expected
> behavior from Helix and the users to handle the same ? Is there any plans
> to fix the same in future ?
>
>
>
> Because what I had observed when the network is flapping helix does handle
> it by calling reset () for the partition(s) from the (disconnect()), then
> why not in this case ?
>
>
>
> void org.apache.helix.manager.zk.ZkHelixConnection.handleStateChanged(KeeperState
> state) throws Exception
>
>
>
> if (isFlapping()) {
>
>         LOG.error("helix-connection: " + this + ", sessionId: " +
> _sessionId
>
>             + " is flapping. diconnect it. " + " maxDisconnectThreshold: "
>
>             + _maxDisconnectThreshold + " disconnects in " +
> _flappingTimeWindowMs + "ms");
>
>         disconnect();
>
>       }
>
>
>
>
>
> Thanks & Regards,
>
> Subramanian.
>
>
>
> Tel: +1 (650) 424 4655 <(650)%20424-4655>
>
>
>
> 3400 Hillview Ave, Building 4
>
> Palo Alto, CA 94304
>
> www.integral.com
>
> [image: Logo_signature_block]
> <http://www.integral.com/fxcloud_features/risk_management.html#ym>
>
> NOTICE: This e-mail message and any attachments, which may contain
> confidential information, are to be viewed solely by the intended recipient
> of Integral Development Corp. For further information, please visit
> http://www.integral.com/about/disclaimer.html.
>
>
>
>
>
>
>
> *From:* kishore g [mailto:g.kishore@gmail.com]
> *Sent:* Wednesday, January 25, 2017 4:45 PM
> *To:* user@helix.apache.org
> *Cc:* dev@helix.incubator.apache.org
> *Subject:* Re: Double assignment , when participant is not able to
> establish connection with zookeeper quorum
>
>
>
> After few seconds, the participant N1 gets a disconnect event from ZK. At
> this time, schedule a timer task for (30  - X) seconds. 30 is the session
> timeout and X can vary from 0 to 30 depending on how long you are ok to not
> have a P1 being down.
>
>
>
> When the timer task kicks in and N1 is still disconnected from the
> cluster, assume that this N1 is no longer the owner of P1.
>
>
>
> After 30 seconds, Helix will notice that N1 is network partitioned and
> will assign P1 to N2.
>
> This will ensure that there is no overlap.
>
>
>
> Will that work for you?
>
>
>
>
>
> On Wed, Jan 25, 2017 at 4:17 PM, Subramanian Raghunathan <
> subramanian.raghunathan@integral.com> wrote:
>
> Hi ,
>
>
>
> Double assignment , when participant is not able to establish connection
> with zookeeper quorum
>
>
>
> Following is the  set up.
>
>
>
> Version(s) :
>
>                                 Helix: 0.7.1
>
>                                 Zookeeper:3.3.4
>
>
>
> - State Model: OnlineOffline
>
> - Controller (leader elected from one of the cluster nodes)
>
> - Single resources with partitions.
>
> - Full auto rebalancer
>
>
>
> -Zookeeper quorum (3 nodes)
>
>
>
> When one participant loses the zookeeper connection (It’s not able to
> connect to any of the zookeepers , a typical occurrence we faced was switch
> failure from that rack)
>
>
>
>   ---- >  The partition (P1) for which this participant (say Node N1) is
> online is still maintained
>
>
>
> Meanwhile since it loses the ephemeral  node in zookeeper , the rebalancer
> gets triggered and it reallocates the partition (P1) to another participant
> node (say Node N2) to become online  @ time T1
>
>
>
>                 ---- >  *After this both N1 and N2 are acting as online
> for the same Partition (P1) *
>
>
>
> But as soon as participant in (say Node N1) is able to re-establish the
> zookeeper connection  @ time T2
>
>                 ---- >  Reset gets called on the partition in participant
> (say Node N1)
>
>
>
> Double assignment:
>
> The question here is this an expected behavior that both nodes N1 and N2
> could be online for the same Partition (P1) between time (T1-T2) ? Any
> responses on the same would be appreciated.
>
>
>
> Thanks & Regards,
>
> Subramanian.
>
>
>
> 3400 Hillview Ave, Building 4
>
> Palo Alto, CA 94304
>
> www.integral.com
>
> [image: Logo_signature_block]
> <http://www.integral.com/fxcloud_features/risk_management.html#ym>
>
> NOTICE: This e-mail message and any attachments, which may contain
> confidential information, are to be viewed solely by the intended recipient
> of Integral Development Corp. For further information, please visit
> http://www.integral.com/about/disclaimer.html.
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message