helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subramanian Raghunathan <subramanian.raghunat...@integral.com>
Subject RE: Double assignment , when participant is not able to establish connection with zookeeper quorum
Date Thu, 26 Jan 2017 18:34:30 GMT
Totally concur your thought,  a config based approach would be better .It could be tuned based
on the acceptable tolerance and consistency.

Thanks,
Subramanian.

Tel: +1 (650) 424 4655

3400 Hillview Ave, Building 4
Palo Alto, CA 94304
www.integral.com<http://www.integral.com/>
[Logo_signature_block]<http://www.integral.com/fxcloud_features/risk_management.html#ym>

NOTICE: This e-mail message and any attachments, which may contain confidential information,
are to be viewed solely by the intended recipient of Integral Development Corp. For further
information, please visit http://www.integral.com/about/disclaimer.html.



From: kishore g [mailto:g.kishore@gmail.com]
Sent: Wednesday, January 25, 2017 7:12 PM
To: user@helix.apache.org
Cc: dev@helix.incubator.apache.org
Subject: Re: Double assignment , when participant is not able to establish connection with
zookeeper quorum

Helix can handle this and probably should. Couple of challenges here are

  1.  How to generalize this across all use cases. This is a trade-off between availability
and ensuring there is only one leader per partition.
  2.  There is a pathological case where all zookeeper nodes get partitioned/crash/GC. In
this case, we will make all participants disconnect and assume they don't own the partition.
But when zookeepers come out of GC, it can continue as if nothing happened i.e it does not
account for the time when its down. I can't think of a good solution for this scenario. Moreover,
we can differentiate between a participant GC'ing/partitioned v/s ZK ensemble crash/partition/GC.
This is typically avoided by ensuring ZK servers are deployed on different racks.
Having said that, I think implementing a config based solution is worth it.





On Wed, Jan 25, 2017 at 4:57 PM, Subramanian Raghunathan <subramanian.raghunathan@integral.com<mailto:subramanian.raghunathan@integral.com>>
wrote:
Hi Kishore ,

                Thank you for the confirmation , yes we had solved it in similar lines and
it did work for us (listening on the disconnect event from ZK).

                From the double assignment point of view is it an expected behavior from Helix
and the users to handle the same ? Is there any plans to fix the same in future ?

Because what I had observed when the network is flapping helix does handle it by calling reset
() for the partition(s) from the (disconnect()), then why not in this case ?

void org.apache.helix.manager.zk.ZkHelixConnection.handleStateChanged(KeeperState state) throws
Exception

if (isFlapping()) {
        LOG.error("helix-connection: " + this + ", sessionId: " + _sessionId
            + " is flapping. diconnect it. " + " maxDisconnectThreshold: "
            + _maxDisconnectThreshold + " disconnects in " + _flappingTimeWindowMs + "ms");
        disconnect();
      }



Thanks & Regards,
Subramanian.

Tel: +1 (650) 424 4655<tel:(650)%20424-4655>

3400 Hillview Ave, Building 4
Palo Alto, CA 94304
www.integral.com<http://www.integral.com/>
[Logo_signature_block]<http://www.integral.com/fxcloud_features/risk_management.html#ym>

NOTICE: This e-mail message and any attachments, which may contain confidential information,
are to be viewed solely by the intended recipient of Integral Development Corp. For further
information, please visit http://www.integral.com/about/disclaimer.html.



From: kishore g [mailto:g.kishore@gmail.com<mailto:g.kishore@gmail.com>]
Sent: Wednesday, January 25, 2017 4:45 PM
To: user@helix.apache.org<mailto:user@helix.apache.org>
Cc: dev@helix.incubator.apache.org<mailto:dev@helix.incubator.apache.org>
Subject: Re: Double assignment , when participant is not able to establish connection with
zookeeper quorum

After few seconds, the participant N1 gets a disconnect event from ZK. At this time, schedule
a timer task for (30  - X) seconds. 30 is the session timeout and X can vary from 0 to 30
depending on how long you are ok to not have a P1 being down.

When the timer task kicks in and N1 is still disconnected from the cluster, assume that this
N1 is no longer the owner of P1.

After 30 seconds, Helix will notice that N1 is network partitioned and will assign P1 to N2.
This will ensure that there is no overlap.

Will that work for you?


On Wed, Jan 25, 2017 at 4:17 PM, Subramanian Raghunathan <subramanian.raghunathan@integral.com<mailto:subramanian.raghunathan@integral.com>>
wrote:
Hi ,

Double assignment , when participant is not able to establish connection with zookeeper quorum

Following is the  set up.

Version(s) :
                                Helix: 0.7.1
                                Zookeeper:3.3.4

- State Model: OnlineOffline
- Controller (leader elected from one of the cluster nodes)
- Single resources with partitions.
- Full auto rebalancer

-Zookeeper quorum (3 nodes)

When one participant loses the zookeeper connection (It’s not able to connect to any of
the zookeepers , a typical occurrence we faced was switch failure from that rack)

  ---- >  The partition (P1) for which this participant (say Node N1) is online is still
maintained

Meanwhile since it loses the ephemeral  node in zookeeper , the rebalancer gets triggered
and it reallocates the partition (P1) to another participant node (say Node N2) to become
online  @ time T1

                ---- >  After this both N1 and N2 are acting as online for the same Partition
(P1)

But as soon as participant in (say Node N1) is able to re-establish the zookeeper connection
 @ time T2
                ---- >  Reset gets called on the partition in participant (say Node N1)

Double assignment:
The question here is this an expected behavior that both nodes N1 and N2 could be online for
the same Partition (P1) between time (T1-T2) ? Any responses on the same would be appreciated.

Thanks & Regards,
Subramanian.

3400 Hillview Ave, Building 4
Palo Alto, CA 94304
www.integral.com<http://www.integral.com/>
[Logo_signature_block]<http://www.integral.com/fxcloud_features/risk_management.html#ym>

NOTICE: This e-mail message and any attachments, which may contain confidential information,
are to be viewed solely by the intended recipient of Integral Development Corp. For further
information, please visit http://www.integral.com/about/disclaimer.html.





Mime
View raw message