zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From German Blanco <german.blanco.bla...@gmail.com>
Subject Re: zookeeper works well but log reports "Cannot open channel to 2 at election address ... java.net.ConnectException: Connection refused"
Date Thu, 13 Feb 2014 10:05:53 GMT
I can´t explain it with this information.
It could be e.g. that in the first cluster, you have a working ensemble of
two of the nodes and the "bin/zkServer.sh status" is trying to connect to
the one that doesn´t work.
I would start by checking the status of each of the servers (not just one).
Try out the 4 letter words commands:
https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkCommands
They give a lot of information.
Also, look in the server logs for the last successful leader election (try
grep for "TOOK" to do this) and the last election started (try grep for
"New election"). If the last thing in the log is a "TOOK" then things
should be fine, if it is a "New election" then there might be a problem.
Then please post what you find out.

Regards,

German.


On Thu, Feb 13, 2014 at 7:18 AM, Tao Xiao <xiaotao.cs.nju@gmail.com> wrote:

> Hi German,
>
> I installed ZooKeeper in another cluster and it works well. I mean I can
> query the status through "bin/zkServer.sh status", how to explain that?
>
>
> 2014-02-11 18:12 GMT+08:00 German Blanco <german.blanco.blanco@gmail.com>:
>
> > Hello,
> >
> > It doesn't matter.
> > The reason is that each zookeeper servers upon start-up try to establish
> > connections with all other servers in its configuration.
> > However, only one connection for pair should be there, so half of the
> > connections are dropped.
> > Right after connection servers send their ids. If the id of the peer
> > initiating the connection is higher than the peer receiving the
> connection,
> > then everything proceeds, otherwise connection is rejected.
> > Why do peers with a lower id try to open connections with the peers with
> a
> > higher id then?
> > Because in that way, they trigger a connection attempt in the other
> > direction. That is, say servers 2 and 3 have a working ensemble, they are
> > not going to attempt to connect to server 1 unless something happens.
> > Server 1 wakes up, it attempts to connect to servers 2 and 3 and these
> two
> > connection attempts fail, but each triggers a connection attempt from the
> > respective peer that succeeds.
> > This was a bit more than you asked for, but anyway I hope it helps :-).
> If
> > the answer doesn't work for you, please let me know.
> >
> > Best regards,
> >
> > German.
> >
> >
> > On Tue, Feb 11, 2014 at 10:53 AM, Tao Xiao <xiaotao.cs.nju@gmail.com>
> > wrote:
> >
> > > I installed zookeeper 3.4.5 in a 3-node cluster and started it. I think
> > > zookeeper works well because the HBase cluster, which relies on
> > zookeeper,
> > > indeed works well. But when I tried to query zookeeper's status, it
> > > reported:
> > >
> > > [root@imon-1 zookeeper-3.4.5]# bin/zkServer.sh status
> > > JMX enabled by default
> > > Using config: /usr/local/apache/zookeeper-3.4.5/bin/../conf/zoo.cfg
> > > Error contacting service. It is probably not running.
> > >
> > >
> > > I checked the log and found it reported the following:
> > >
> > > 2014-02-11 15:42:15,623 [myid:] - INFO  [main:QuorumPeerConfig@101] -
> > > Reading configuration from:
> > > /usr/local/apache/zookeeper-3.4.5/bin/../conf/zoo.cfg
> > > 2014-02-11 15:42:15,629 [myid:] - INFO  [main:QuorumPeerConfig@334] -
> > > Defaulting to majority quorums
> > > 2014-02-11 15:42:15,652 [myid:1] - INFO  [main:DatadirCleanupManager@78
> ]
> > -
> > > autopurge.snapRetainCount set to 3
> > > 2014-02-11 15:42:15,652 [myid:1] - INFO  [main:DatadirCleanupManager@79
> ]
> > -
> > > autopurge.purgeInterval set to 0
> > > 2014-02-11 15:42:15,653 [myid:1] - INFO
>  [main:DatadirCleanupManager@101
> > ]
> > > -
> > > Purge task is not scheduled.
> > > 2014-02-11 15:42:15,683 [myid:1] - INFO  [main:QuorumPeerMain@127] -
> > > Starting quorum peer
> > > 2014-02-11 15:42:15,726 [myid:1] - INFO  [main:NIOServerCnxnFactory@94
> ]
> > -
> > > binding to port 0.0.0.0/0.0.0.0:2181
> > > 2014-02-11 15:42:15,759 [myid:1] - INFO  [main:QuorumPeer@913] -
> > tickTime
> > > set to 2000
> > > 2014-02-11 15:42:15,760 [myid:1] - INFO  [main:QuorumPeer@933] -
> > > minSessionTimeout set to -1
> > > 2014-02-11 15:42:15,760 [myid:1] - INFO  [main:QuorumPeer@944] -
> > > maxSessionTimeout set to -1
> > > 2014-02-11 15:42:15,760 [myid:1] - INFO  [main:QuorumPeer@959] -
> > initLimit
> > > set to 10
> > > 2014-02-11 15:42:15,874 [myid:1] - INFO  [main:FileSnap@83] - Reading
> > > snapshot /var/data/zk/dataDir/version-2/snapshot.100072a68
> > > 2014-02-11 15:42:18,211 [myid:1] - INFO
> > >  [Thread-1:QuorumCnxManager$Listener@486] - My election bind port:
> > > 0.0.0.0/0.0.0.0:3888
> > > 2014-02-11 15:42:18,226 [myid:1] - INFO  [QuorumPeer[myid=1]/
> > 0.0.0.0:2181
> > > :QuorumPeer@670] - LOOKING
> > > 2014-02-11 15:42:18,232 [myid:1] - INFO  [QuorumPeer[myid=1]/
> > 0.0.0.0:2181
> > > :FastLeaderElection@740] - New election. My id =  1, proposed
> > > zxid=0x1000735ac
> > > 2014-02-11 15:42:18,234 [myid:1] - INFO
> > >  [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 1
> > > (n.leader), 0x1000735ac (n.zxid), 0x1 (n.round), LOOKING (n.state), 1
> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
> > > 2014-02-11 15:42:18,256 [myid:1] - WARN
> > >  [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to
> 2
> > at
> > > election address imon-2/172.16.38.144:3888
> > > java.net.ConnectException: Connection refused
> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> > >         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> > >         at
> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> > >         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> > >         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> > >         at java.net.Socket.connect(Socket.java:529)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
> > >         at java.lang.Thread.run(Thread.java:662)
> > > 2014-02-11 15:42:18,268 [myid:1] - WARN
> > >  [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to
> 3
> > at
> > > election address imon-3/172.16.38.145:3888
> > > java.net.ConnectException: Connection refused
> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> > >         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> > >         at
> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> > >         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> > >         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> > >         at java.net.Socket.connect(Socket.java:529)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
> > >         at
> > >
> > >
> >
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
> > >         at java.lang.Thread.run(Thread.java:662)
> > > 2014-02-11 15:42:18,436 [myid:1] - WARN  [QuorumPeer[myid=1]/
> > 0.0.0.0:2181
> > > :QuorumCnxManager@368] - Cannot open channel to 2 at election address
> > > imon-2/172.16.38.144:3888
> > > java.net.ConnectException: Connection refused
> > >
> > > ... ...
> > >
> > >
> > > Do these warnings matter?  What is the reason.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message