hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charity Majors <char...@shopkick.com>
Subject zookeeper crash
Date Wed, 02 Jun 2010 18:11:51 GMT
I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an attempt to get away from
a client bug that was crashing my backend services.

Unfortunately, this morning I had a server crash, and it brought down my entire cluster. 
I don't have the logs leading up to the crash, because -- argghffbuggle -- log4j wasn't set
up correctly.  But I restarted all three nodes, and odes two and three came back up and formed
a quorum.  

Node one, meanwhile, does this:

2010-06-02 17:04:56,446 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING
2010-06-02 17:04:56,446 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot
/services/zookeeper/data/zookeeper/version-2/snapshot.a00000045
2010-06-02 17:04:56,476 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649]
- New election. My id =  1, Proposed zxid = 47244640287
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689]
- Notification: 1, 47244640287, 4, 1, LOOKING, LOOKING, 1
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@799]
- Notification: 3, 38654707048, 3, 1, LOOKING, LEADING, 3
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@799]
- Notification: 3, 38654707048, 3, 1, LOOKING, FOLLOWING, 2
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@642] - FOLLOWING
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@151] - Created
server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /services/zookeeper/data/zookeeper/version-2
snapdir /services/zookeeper/data/zookeeper/version-2
2010-06-02 17:04:56,486 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@71] - Leader epoch
a is less than our epoch b
2010-06-02 17:04:56,486 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception
when following the leader
java.io.IOException: Error: Epoch of leader is lower
       at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
       at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
2010-06-02 17:04:56,486 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
called
java.lang.Exception: shutdown Follower
       at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
       at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)



All I can find is this, http://www.mail-archive.com/zookeeper-commits@hadoop.apache.org/msg00449.html,
which implies that this state should never happen.

Any suggestions?  If it happens again, I'll just have to roll everything back to 3.2.1 and
live with the client crashes.





Mime
View raw message