zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: Possible issue with cluster availability following new Leader Election - ZK 3.4
Date Thu, 17 May 2012 02:50:28 GMT
This pretty much matches what I expect. It would be great if you
wanted to try your hand at creating a patch and submitting it to the
ticket that was created for this problem, but if not, please post this
analysis to issue 1465 and we'll look at it ASAP.

C

On Wed, May 16, 2012 at 2:55 PM, Vinayak Khot <vinayak@nutanix.com> wrote:
> We also have encountered a problem where the newly elected leader
> sends entire
> snapshot to a follower even though the follower is in sync with the leader.
>
> A closer look at the code shows the problem in the logic where we decide to
> send
> a snapshot.
> Following scenario explains the problem in details.
> Start a 3 node Zookeeper ensemble where every quorum member has seen same
> changes.
> zxid: *0x400000004*
>
> 1. When a newly elected leader starts, it bumps up its zxid to the new
> epoch.
>
> Code snippet Leader.java
>
> long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch());
> zk.setZxid(ZxidUtils.makeZxid(epoch, 0));
> synchronized(this){
>     lastProposed = zk.getZxid();  // *0x500000000*
> }
>
> 2. Now a follower tries to join the leader with its peerLastZxid = *
> 0x400000004*
>
> Note that now the leader has in memory committedLog list with* *
> maxCommittedLog=*0x400000004** *
> *
> *
> As committedLog don't have any new transactions which have zxid >
> peerLastZxid, we check if
> the leader and follower are in sync.
>
> Code snippet from LearnerHandler.java
> leaderLastZxid = leader.startForwarding(this, updates);
> if (peerLastZxid == leaderLastZxid) {   *0x400000004 == **0x500000000*
>   // We are in sync so we'll do an empty diff
>   packetToSend = Leader.DIFF;
>   zxidToSend = leaderLastZxid;
> }
>
> Note that the function *leader.startForwarding()* returns *lastProposed *zxid
> which is already set to
> *0x500000000 *by the leader.
> So in this scenario we never send empty diff even though the leader and
> follower are in sync,
> and we end up sending entire snapshot in the code that follows above check.
>
> A possible fix would be to keep *lastProcessedZxid* in the leader which
> will get updated only when
> the leader processes a transaction. While syncing with a follower, if the
> peerLastZxid sent by a follower
> is same as lastProcessedZxid of the leader we can send empty diff to the
> follower.
> This shall avoid unnecessarily sending entire snapshot when the leader and
> follower are already in sync.
>
> Zookeeper developers please share your views on above mentioned issue.
>
> - Vinayak
>
> On Mon, May 14, 2012 at 8:30 AM, Camille Fournier <camille@apache.org>wrote:
>
>> Thanks.
>> I just ran a couple of tests to start the debugging. Mark, I don't see
>> a long cluster settle with a mostly empty data set, so I think this
>> might be two different problems. I do see a lot of snapshots being
>> sent though so there is probably some overaggressiveness in the way
>> that we evaluate when to send snapshots that should be evaluated.
>> Adding the dev mailing list, as I may need ben or flavio to take a
>> look as well.
>>
>> C
>>
>> On Thu, May 10, 2012 at 10:48 AM,  <Alexandar.Gvozdenovic@ubs.com> wrote:
>> > Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Camille Fournier [mailto:camille@apache.org]
>> > Sent: 10 May 2012 14:58
>> > To: user@zookeeper.apache.org
>> > Subject: Re: Possible issue with cluster availability following new
>> Leader Election - ZK 3.4
>> >
>> > I will take a look at this soon, have you created a Jira for it? If not
>> please do so.
>> >
>> > Thanks,
>> > C
>> >
>> > On Thu, May 10, 2012 at 7:20 AM,  <Alexandar.Gvozdenovic@ubs.com> wrote:
>> >> I think there may be a problem here with the 3.4 branch. I dropped the
>> >> cluster back to 3.3.5 and the behaviour was much better.
>> >>
>> >> To summarize:
>> >>
>> >> 650mb of data
>> >> 20k nodes of varied size
>> >> 3 node cluster
>> >>
>> >> On 3.4.x (using latest branch build)
>> >> ---------
>> >> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to
>> >> recover from a leader failure Takes 10 secs for a new follower to join
>> >> the cluster
>> >>
>> >> On 3.3.5
>> >> --------
>> >> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to
>> >> recover from a leader failure Takes 10 secs for a new follower to join
>> >> the cluster
>> >>
>> >> Any views on this from the ZK devs? The differences in behaviour only
>> >> start becoming apparent as the dataset gets bigger.
>> >> I was hoping to use 3.4 for the transactional features it offered via
>> >> the 'multi-update' operations, but this issue seems pretty serious...
>> >>
>> >>
>> >>
>> >> Visit our website at http://www.ubs.com
>> >>
>> >> This message contains confidential information and is intended only
>> >> for the individual named. If you are not the named addressee you
>> >> should not disseminate, distribute or copy this e-mail. Please notify
>> >> the sender immediately by e-mail if you have received this e-mail by
>> >> mistake and delete this e-mail from your system.
>> >>
>> >> E-mails are not encrypted and cannot be guaranteed to be secure or
>> >> error-free as information could be intercepted, corrupted, lost,
>> >> destroyed, arrive late or incomplete, or contain viruses. The sender
>> >> therefore does not accept liability for any errors or omissions in the
>> >> contents of this message which arise as a result of e-mail transmission.
>> >> If verification is required please request a hard-copy version. This
>> >> message is provided for informational purposes and should not be
>> >> construed as a solicitation or offer to buy or sell any securities or
>> >> related financial instruments.
>> >>
>> >> UBS Limited is a company limited by shares incorporated in the United
>> >> Kingdom registered in England and Wales with number 2035362.
>> >> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is
>> >> authorised and regulated by the Financial Services Authority.
>> >>
>> >> UBS AG is a public company incorporated with limited liability in
>> >> Switzerland domiciled in the Canton of Basel-City and the Canton of
>> >> Zurich respectively registered at the Commercial Registry offices in
>> >> those Cantons with Identification No: CH-270.3.004.646-4 and having
>> >> respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> >> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United
>> >> Kingdom as a foreign company with No: FC021146 and having a UK
>> >> Establishment registered at Companies House, Cardiff, with No:
>> >> BR 004507.  The principal office of UK Establishment: 1 Finsbury
>> >> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
>> >> and regulated by the Financial Services Authority.
>> >>
>> >> UBS reserves the right to retain all messages. Messages are protected
>> >> and accessed only in legally justified cases.
>> > Visit our website at http://www.ubs.com
>> >
>> > This message contains confidential information and is intended only
>> > for the individual named. If you are not the named addressee you
>> > should not disseminate, distribute or copy this e-mail. Please
>> > notify the sender immediately by e-mail if you have received this
>> > e-mail by mistake and delete this e-mail from your system.
>> >
>> > E-mails are not encrypted and cannot be guaranteed to be secure or
>> > error-free as information could be intercepted, corrupted, lost,
>> > destroyed, arrive late or incomplete, or contain viruses. The sender
>> > therefore does not accept liability for any errors or omissions in the
>> > contents of this message which arise as a result of e-mail transmission.
>> > If verification is required please request a hard-copy version. This
>> > message is provided for informational purposes and should not be
>> > construed as a solicitation or offer to buy or sell any securities
>> > or related financial instruments.
>> >
>> > UBS Limited is a company limited by shares incorporated in the United
>> > Kingdom registered in England and Wales with number 2035362.
>> > Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
>> > is authorised and regulated by the Financial Services Authority.
>> >
>> > UBS AG is a public company incorporated with limited liability in
>> > Switzerland domiciled in the Canton of Basel-City and the Canton of
>> > Zurich respectively registered at the Commercial Registry offices in
>> > those Cantons with Identification No: CH-270.3.004.646-4 and having
>> > respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> > Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
>> > United Kingdom as a foreign company with No: FC021146 and having a
>> > UK Establishment registered at Companies House, Cardiff, with No:
>> > BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
>> > London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
>> > regulated by the Financial Services Authority.
>> >
>> > UBS reserves the right to retain all messages. Messages are protected
>> > and accessed only in legally justified cases.
>>

Mime
View raw message