hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal K <vishalm...@gmail.com>
Subject Re: [jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
Date Fri, 18 Jun 2010 22:33:05 GMT
Nevermind. I am on the wrong track. Flavio's earlier mail did clarify that
the follower received the epoch before restart.

On Fri, Jun 18, 2010 at 6:20 PM, Vishal K <vishalmlst@gmail.com> wrote:

> I might be wrong here, but let me try to chip in my few cents.
>
> I think the problem is in LearnerHandler.java at the leader fo this
> Follower.
>
>             /* see what other packets from the proposal
>              * and tobeapplied queues need to be sent
>              * and then decide if we can just send a DIFF
>              * or we actually need to send the whole snapshot
>              */
>             long leaderLastZxid = leader.startForwarding(this, updates);
> ---> this leaderLastZxid returned is probably incorrect.
>             // a special case when both the ids are the same
>             if (peerLastZxid == leaderLastZxid) {
>                 packetToSend = Leader.DIFF;
>                 zxidToSend = leaderLastZxid;
>             }
>
>             QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER,
>                     leaderLastZxid, null, null);
>             oa.writeRecord(newLeaderQP, "packet");
>             bufferedOutput.flush()
>
>
>
> On Fri, Jun 18, 2010 at 4:49 PM, Flavio Paiva Junqueira (JIRA) <
> jira@apache.org> wrote:
>
>>
>>    [
>> https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880320#action_12880320]
>>
>> Flavio Paiva Junqueira commented on ZOOKEEPER-335:
>> --------------------------------------------------
>>
>> Guys, I don't see enough information in these logs to determine what's
>> going on. Let me tell you what I'm seeing so that perhaps other folks can
>> help me out here.
>>
>> One part of the log that is suspicious is this one:
>>
>> {noformat}
>> =6693 [QuorumPeer:/0.0.0.0:2181] WARN
>>  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x300000001 expected
>> 0x1
>> =6693 [QuorumPeer:/0.0.0.0:2181] WARN
>>  org.apache.zookeeper.server.quorum.Learner  - Got zxid 0x300000001 expected
>> 0x1
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor30]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor22]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor23]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor18]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor20]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor19]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor31]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor21]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor26]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor25]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor29]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor28]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor24]
>> [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor32]
>>
>> ************* NODE RESTARTED HERE **********************
>> {noformat}
>>
>> Before being restarted, the bad node receives a proposal with zxid <3,1>
>> and it expects <0,1>. Next in the logs after being restarted, I can see that
>> it is complaining that it has epoch 4 and the leader 3. Something strange
>> apparently happened during the restart. It also seems to be the case that
>> the node was being able to talk to the others (first entries in the log
>> before the excerpt above).
>>
>> Do you guys see anything I'm overlooking?
>>
>> > zookeeper servers should commit the new leader txn to their logs.
>> > -----------------------------------------------------------------
>> >
>> >                 Key: ZOOKEEPER-335
>> >                 URL:
>> https://issues.apache.org/jira/browse/ZOOKEEPER-335
>> >             Project: Zookeeper
>> >          Issue Type: Bug
>> >          Components: server
>> >    Affects Versions: 3.1.0
>> >            Reporter: Mahadev konar
>> >            Assignee: Mahadev konar
>> >            Priority: Blocker
>> >             Fix For: 3.4.0
>> >
>> >         Attachments: zk.log.gz, zklogs.tar.gz
>> >
>> >
>> > currently the zookeeper followers do not commit the new leader election.
>> This will cause problems in a failure scenarios with a follower acking to
>> the same leader txn id twice, which might be two different intermittent
>> leaders and allowing them to propose two different txn's of the same zxid.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message