hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal K <vishalm...@gmail.com>
Subject Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
Date Fri, 18 Jun 2010 22:20:42 GMT
I might be wrong here, but let me try to chip in my few cents.

I think the problem is in LearnerHandler.java at the leader fo this
Follower.

            /* see what other packets from the proposal
             * and tobeapplied queues need to be sent
             * and then decide if we can just send a DIFF
             * or we actually need to send the whole snapshot
             */
            long leaderLastZxid = leader.startForwarding(this, updates);
---> this leaderLastZxid returned is probably incorrect.
            // a special case when both the ids are the same
            if (peerLastZxid == leaderLastZxid) {
                packetToSend = Leader.DIFF;
                zxidToSend = leaderLastZxid;
            }

            QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER,
                    leaderLastZxid, null, null);
            oa.writeRecord(newLeaderQP, "packet");
            bufferedOutput.flush()


On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) <
jira@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880320#action_12880320]
>
> Flavio Paiva Junqueira commented on ZOOKEEPER-335:
> --------------------------------------------------
>
> Guys, I don't see enough information in these logs to determine what's
> going on. Let me tell you what I'm seeing so that perhaps other folks can
> help me out here.
>
> One part of the log that is suspicious is this one:
>
> {noformat}
> =6693 [QuorumPeer:/0.0.0.0:2181] WARN
>  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x300000001 expected
> 0x1
> =6693 [QuorumPeer:/0.0.0.0:2181] WARN
>  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x300000001 expected
> 0x1
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24]
> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32]
>
> ************* NODE RESTARTED HERE **********************
> {noformat}
>
> Before being restarted, the bad node receives a proposal with zxid <3,1>
> and it expects <0,1>. Next in the logs after being restarted, I can see that
> it is complaining that it has epoch 4 and the leader 3. Something strange
> apparently happened during the restart. It also seems to be the case that
> the node was being able to talk to the others (first entries in the log
> before the excerpt above).
>
> Do you guys see anything I'm overlooking?
>
> > zookeeper servers should commit the new leader txn to their logs.
> > -----------------------------------------------------------------
> >
> >                 Key: ZOOKEEPER-335
> >                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
> >             Project: Zookeeper
> >          Issue Type: Bug
> >          Components: server
> >    Affects Versions: 3.1.0
> >            Reporter: Mahadev konar
> >            Assignee: Mahadev konar
> >            Priority: Blocker
> >             Fix For: 3.4.0
> >
> >         Attachments: zk.log.gz, zklogs.tar.gz
> >
> >
> > currently the zookeeper followers do not commit the new leader election.
> This will cause problems in a failure scenarios with a follower acking to
> the same leader txn id twice, which might be two different intermittent
> leaders and allowing them to propose two different txn's of the same zxid.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message