hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other
Date Wed, 27 Aug 2008 13:06:55 GMT
After some further analysis I think I have found a bug.

In QuorumCnxManager.toSend there is a call to create a connection as follows:
    channel = SocketChannel.open(new InetSocketAddress(addr, port));

Unfortunately "addr" is the ip address of a remote server while "port" is the electionPort
of *this* server.
As an example, given this configuration (taken from my zoo.cfg)
  server.1=10.20.9.254:2881
  server.2=10.20.9.9:2882
  server.3=10.20.9.254:2883
Server 3 was observed trying to make a connection to host 10.20.9.9 on port 2883 and obviously
failing.

In tests where all machines use the same electionPort this bug would not manifest itself.

Cheers,
Mark







----- Original Message ----
From: mark harwood <markharw00d@yahoo.co.uk>
To: zookeeper-user@hadoop.apache.org
Sent: Wednesday, 27 August, 2008 12:11:58
Subject: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each
other

First a quick thanks for releasing this project - very useful.

I've had success working with the sourceforge version (2.2.1) and just tried moving to the
Apache SVN trunk version and found the servers fail to find each other.

My test environment has 3 zookeeper servers all running on the same machine, started from
the command line in different directories.
I changed my startup batch files to run QuorumPeerMain in place of conf QuorumPeer, wiped
the data directories (keeping the "myid" files) and used the previous zoo.cfg files (an example
below).

#########  Server 1 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2181
electionPort=2881
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 2 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2182
electionPort=2882
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 3 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2183
electionPort=2883
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

Firing up each server, they all hang with the following output

D:\tmp\Zookeeper3Servers\server2>java -cp lib\zookeeper-dev.jar;lib\log4j-1.2.15
.jar;conf org.apache.zookeeper.server.quorum.QuorumPeerMain conf/zoo.cfg
INFO  - [QuorumPeer:QuorumPeer@379] - LOOKING
WARN  - [QuorumPeer:FastLeaderElection@493] - New election: 0

I tried firing up one of the servers from Eclipse in debug mode  and it appeared to loop around
FastLeaderElection.lookForLeader().

While poking around in the debugger I also noticed that in QuorumCnxManager.toSend this test
failed:
    if (addr.equals(localIP)) 
..because addr was held as "localhost/127.0.0.1" and localIP was held as my 10.20.x.x address
on the local network.
I tried changing the zoo.cfg files to the 10.20.x.x address and this made the above "if" statement
evaluate to true but the end result was the same - servers failing to connect.

If it helps, the logging from my sourceforge 2.2.1 run of the above config produces the following
and works fine:

D:\servers\IeIncrementalIndexingTests\ZookeeperServers\server3>java -cp lib\zook
eeper-dev.jar;lib\log4j-1.2.15.jar;conf com.yahoo.zookeeper.server.quorum.Quorum
Peer conf/zoo.cfg
WARN  - [QuorumPeer:QuorumPeer@388] - LOOKING
WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 1      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 1
WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 2
WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
WARN  - [QuorumPeer:Follower@124] - Following localhost/127.0.0.1:2882
WARN  - [QuorumPeer:Follower@171] - Getting a snapshot from leader
WARN  - [NIOServerCxn.Factory:NIOServerCnxn@471] - Connected to /127.0.0.1:2375
lastZxid 0
WARN  - [NIOServerCxn.Factory:NIOServerCnxn@500] - Creating new session 31c03d95
1fe0000
WARN  - [QuorumPeer:Follower@219] - Got zxid 100000001 expected 1
WARN  - [SyncThread:Profiler@34] - Elapsed 10717 ms: Logfile padding exceeded ti
me threshold
WARN  - [Thread-0:NIOServerCnxn@774] - Finished init of 31c03d951fe0000: true

This looks to be using a different leader election algo. 

Any ideas?
Cheers,
Mark


Send instant messages to your online friends http://uk.messenger.yahoo.com


Send instant messages to your online friends http://uk.messenger.yahoo.com 

Mime
View raw message