zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishal K (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-962) leader/follower coherence issue when follower is receiving a DIFF
Date Mon, 03 Jan 2011 12:26:45 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976713#action_12976713
] 

Vishal K commented on ZOOKEEPER-962:
------------------------------------

Hi,

May I please request an expected date for this fix? We are working on releasing our product
in a month and I think this is a serious enough bug that might block our release. It will
be very helpful to know when the fix is coming.

Thanks for your help.

> leader/follower coherence issue when follower is receiving a DIFF
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-962
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-962
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2
>            Reporter: Camille Fournier
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>
> From mailing list:
> It seems like we rely on the LearnerHandler thread startup to capture all of the missing
committed
> transactions in the SNAP or DIFF, but I don't see anything (especially in the DIFF case)
that
> is preventing us for committing more transactions before we actually start forwarding
updates
> to the new follower.
> Let me explain using my example from ZOOKEEPER-919. Assume we have quorum already, so
the
> leader can be processing transactions while my follower is starting up.
> I'm a follower at zxid N-5, the leader is at N. I send my FOLLOWERINFO packet to the
leader
> with that information. The leader gets the proposals from its committed log (time T1),
then
> syncs on the proposal list (LearnerHandler line 267. Why? It's a copy of the underlying
proposal
> list... this might be part of our problem). I check to see if the peerLastZxid is within
my
> max and min committed log and it is, so I'm going to send a diff. I set the zxidToSend
to
> be the maxCommittedLog at time T3 (we already know this is sketchy), and forward the
proposals
> from my copied proposal list starting at the peerLastZxid+1 up to the last proposal transaction
> (as seen at time T1).
> After I have queued up all those diffs to send, I tell the leader to startFowarding updates
> to this follower (line 308). 
> So, let's say that at time T2 I actually swap out the leader to the thread that is handling
> the various request processors, and see that I got enough votes to commit zxid N+1. I
commit
> N+1 and so my maxCommittedLog at T3 is N+1, but this proposal is not in the list of proposals
> that I got back at time T1, so I don't forward this diff to the client. Additionally,
I processed
> the commit and removed it from my leader's toBeApplied list. So when I call startForwarding
> for this new follower, I don't see this transaction as a transaction to be forwarded.

> There's one problem. Let's also imagine, however, that I commit N+1 at time T4. The maxCommittedLog
> value is consistent with the max of the diff packets I am going to send the follower.
But,
> I still committed N+1 and removed it from the toBeApplied list before calling startFowarding
> with this follower. How does the follower get this transaction? Does it?
> To put it another way, here is the thread interaction, hopefully formatted so you can
read
> it...
> 		LearnerHandlerThread					RequestProcessorThread
> T1(LH):	get list of proposals (COPY)
> T2(RPT):								commit N+1, remove from toBeApplied
> T3(LH):	get maxCommittedLog
> T4(LH):	send diffs from view at T1
> T5(LH):	startForwarding
> Or
> T1(LH):	get list of proposals (COPY)
> T2(LH):	get maxCommittedLog
> T3(RPT):								commit N+1, remove from toBeApplied
> T4(LH):	send diffs from view at T1
> T5(LH):	startFowarding
> I'm trying to figure out what, if anything, keeps the requests from being committed,
removed,
> and never seen by the follower before it fully starts up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message