hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: zookeeper crash
Date Wed, 02 Jun 2010 18:20:12 GMT
This looks a bit like a small bobble we had when upgrading a bit ago.

I THINK that the answer here is to mind-wipe the misbehaving node and have
it resynch from scratch from the other nodes.

Wait for confirmation from somebody real.

On Wed, Jun 2, 2010 at 11:11 AM, Charity Majors <charity@shopkick.com>wrote:

> I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an
> attempt to get away from a client bug that was crashing my backend services.
>
> Unfortunately, this morning I had a server crash, and it brought down my
> entire cluster.  I don't have the logs leading up to the crash, because --
> argghffbuggle -- log4j wasn't set up correctly.  But I restarted all three
> nodes, and odes two and three came back up and formed a quorum.
>
> Node one, meanwhile, does this:
>
> 2010-06-02 17:04:56,446 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING
> 2010-06-02 17:04:56,446 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot
> /services/zookeeper/data/zookeeper/version-2/snapshot.a00000045
> 2010-06-02 17:04:56,476 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election.
> My id =  1, Proposed zxid = 47244640287
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification:
> 1, 47244640287, 4, 1, LOOKING, LOOKING, 1
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@799] - Notification:
> 3, 38654707048, 3, 1, LOOKING, LEADING, 3
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@799] - Notification:
> 3, 38654707048, 3, 1, LOOKING, FOLLOWING, 2
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@642] - FOLLOWING
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@151] - Created server
> with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
> /services/zookeeper/data/zookeeper/version-2 snapdir
> /services/zookeeper/data/zookeeper/version-2
> 2010-06-02 17:04:56,486 - FATAL
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@71] - Leader epoch a is less
> than our epoch b
> 2010-06-02 17:04:56,486 - WARN
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following
> the leader
> java.io.IOException: Error: Epoch of leader is lower
>       at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
>       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
> 2010-06-02 17:04:56,486 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
>       at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
>
>
>
> All I can find is this,
> http://www.mail-archive.com/zookeeper-commits@hadoop.apache.org/msg00449.html,
> which implies that this state should never happen.
>
> Any suggestions?  If it happens again, I'll just have to roll everything
> back to 3.2.1 and live with the client crashes.
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message