zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michi Mutsuzaki <mi...@cs.stanford.edu>
Subject Re: Cannot open channel to 2 at election address ERROR while starting zookeeper
Date Thu, 12 Feb 2015 08:11:33 GMT
I would try commenting out this line from /etc/hosts:

127.0.1.1       machineA.dc.host.com machineA

I guess nslookup doesn't check /etc/hosts.



On Wed, Feb 11, 2015 at 11:25 PM, Check Peck <comptechgeeky@gmail.com> wrote:
> In /etc/hosts, this is what I found -
>
>     127.0.0.1       localhost
>     127.0.1.1       machineA.dc.host.com machineA
>
>     # The following lines are desirable for IPv6 capable hosts
>     ::1     ip6-localhost ip6-loopback
>     fe00::0 ip6-localnet
>     ff00::0 ip6-mcastprefix
>     ff01::3 ip6-allnodes
>     ff05::2 ip6-allrouters
>
>     10.22.45.11     machineA.dc.host.com machineA
>
> And nslookup on the same machine results back -
>
>     check@machineA:~$ nslookup machineA.dc.host.com
>     Server:           10.171.127.141
>     Address:        10.171.127.141#53
>
>     Name:   machineA.dc.host.com
>     Address: 10.22.45.11
>
>
> Does anything look off?
>
>
> On Wed, Feb 11, 2015 at 11:08 PM, Michi Mutsuzaki <michi@cs.stanford.edu>
> wrote:
>>
>> There might be an entry in /etc/hosts. You can do 'nslookup
>> machineA.host.com' to see what ip address it resolves to. You can also
>> try using ip addresses instead of hostnames in zoo.cfg.
>>
>> On Wed, Feb 11, 2015 at 11:02 PM, Check Peck <comptechgeeky@gmail.com>
>> wrote:
>> > Thanks Michi for the help. How can I verify this thing out?
>> >
>> > Is there any way which I can check in my Ubuntu machines?
>> >
>> > On Wed, Feb 11, 2015 at 10:58 PM, Michi Mutsuzaki
>> > <michi@cs.stanford.edu>
>> > wrote:
>> >>
>> >> Hi Check sorry for the late response.
>> >>
>> >> It looks like the leader election is binding to the loopback instead
>> >> of 0.0.0.0. Maybe the hostname is configured to resolve to 127.0.1.1
>> >> locally?
>> >>
>> >> [Thread-1:QuorumCnxManager$Listener@504] - My election bind port:
>> >> machineA.host.com/127.0.1.1:3888
>> >>
>> >> On Wed, Feb 11, 2015 at 1:49 PM, Check Peck <comptechgeeky@gmail.com>
>> >> wrote:
>> >> > Adding zookeeper dev mailing to this. Has anyone seen this issue
>> >> > before?
>> >> >
>> >> >
>> >> > On Wed, Feb 11, 2015 at 9:56 AM, Check Peck <comptechgeeky@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Can anyone help me on this? Has anyone seen these kind of issues?
>> >> >>
>> >> >> On Tue, Feb 10, 2015 at 4:26 PM, Check Peck
>> >> >> <comptechgeeky@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> I have also verified there is no firewall issue. Does anyone
know
>> >> >>> what
>> >> >>> is
>> >> >>> this error all about and how we can resolve this?
>> >> >>>
>> >> >>> On Tue, Feb 10, 2015 at 9:20 AM, Check Peck
>> >> >>> <comptechgeeky@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> I am trying to setup 5 node zookeeper ensemble manage through
>> >> >>>> Exhibitor.
>> >> >>>> I have 5 machines and on each machine I will be running
exhibitor
>> >> >>>> and
>> >> >>>> zookeeper. Below is my zoo.cfg file which is generated
by
>> >> >>>> exhibitor.
>> >> >>>>
>> >> >>>>     #Auto-generated by Exhibitor - Mon Feb 09 10:18:35
GMT-07:00
>> >> >>>> 2015
>> >> >>>>     #Mon Feb 09 10:18:35 GMT-07:00 2015
>> >> >>>>     server.3=machineC.host.com\:
>> >> >>>> 2888\:3888
>> >> >>>>     server.2=machineB.host.com\:2888\:3888
>> >> >>>>     server.1=machineA.host.com\:2888\:3888
>> >> >>>>     initLimit=10
>> >> >>>>     syncLimit=5
>> >> >>>>     maxClientCnxns=21000
>> >> >>>>     clientPort=2181
>> >> >>>>     tickTime=2000
>> >> >>>>     dataDir=/opt/zookeeper/data
>> >> >>>>     dataLogDir=/opt/zookeeper/data
>> >> >>>>     server.5=machineD.host.com\:2888\:3888
>> >> >>>>     server.4=machineE.host.com\:2888\:3888
>> >> >>>>
>> >> >>>> As soon as I am starting zookeeper through Exhibitor config
>> >> >>>> pannel, I
>> >> >>>> can see all the five machines in my control panel but they
all are
>> >> >>>> yellow
>> >> >>>> which means "ZooKeeper is running, but can't communicate
with the
>> >> >>>> rest of
>> >> >>>> the ensemble" and in my Exhibitor logs, I am seeing these
which
>> >> >>>> has
>> >> >>>> some
>> >> >>>> ERROR in it.
>> >> >>>>
>> >> >>>>     dev
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Exhibitor
>> >> >>>> started [main]
>> >> >>>>     INFO  org.mortbay.log  Logging to
>> >> >>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> >> >>>> org.mortbay.log.Slf4jLog [main]
>> >> >>>>     INFO  org.mortbay.log  jetty-6.1.x [main]
>> >> >>>>     INFO  org.mortbay.log  Started SocketConnector@0.0.0.0:8080
>> >> >>>> [main]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 State:
>> >> >>>> not
>> >> >>>> serving [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> down/not-serving waiting 30004 of 40000 ms before restarting
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Restarting
>> >> >>>> down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> stop instance [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> start/restart ZooKeeper [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Kill
>> >> >>>> attempted result: 0 [ActivityQueue-0]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: JMX enabled by default [pool-2-thread-1]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: -Xmx2048m -Djava.net.preferIPv4Stack=true
>> >> >>>> [pool-2-thread-2]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Process
>> >> >>>> started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Using config:
>> >> >>>> /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
>> >> >>>> [pool-2-thread-1]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Starting zookeeper ... STARTED [pool-2-thread-2]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> down/not-serving waiting 30005 of 40000 ms before restarting
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Restarting
>> >> >>>> down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> stop instance [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> start/restart ZooKeeper [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Kill
>> >> >>>> attempted result: 0 [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Process
>> >> >>>> started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: JMX enabled by default [pool-2-thread-1]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: -Xmx2048m -Djava.net.preferIPv4Stack=true
>> >> >>>> [pool-2-thread-2]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Using config:
>> >> >>>> /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
>> >> >>>> [pool-2-thread-1]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Starting zookeeper ... STARTED [pool-2-thread-2]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> down/not-serving waiting 30004 of 40000 ms before restarting
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Restarting
>> >> >>>> down/not-serving ZooKeeper after 60014 ms pause [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> stop instance [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> start/restart ZooKeeper [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Kill
>> >> >>>> attempted result: 0 [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Process
>> >> >>>> started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: JMX enabled by default [pool-2-thread-3]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: -Xmx2048m -Djava.net.preferIPv4Stack=true
>> >> >>>> [pool-2-thread-2]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Using config:
>> >> >>>> /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
>> >> >>>> [pool-2-thread-3]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Starting zookeeper ... STARTED [pool-2-thread-2]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> down/not-serving waiting 30005 of 40000 ms before restarting
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Restarting
>> >> >>>> down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> stop instance [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> Attempting
>> >> >>>> to
>> >> >>>> start/restart ZooKeeper [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Kill
>> >> >>>> attempted result: 0 [ActivityQueue-0]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
 Process
>> >> >>>> started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh
>> >> >>>> [ActivityQueue-0]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: JMX enabled by default [pool-2-thread-2]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: -Xmx2048m -Djava.net.preferIPv4Stack=true
>> >> >>>> [pool-2-thread-3]
>> >> >>>>     ERROR com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Using config:
>> >> >>>> /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
>> >> >>>> [pool-2-thread-2]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> Server: Starting zookeeper ... STARTED [pool-2-thread-3]
>> >> >>>>     INFO  com.netflix.exhibitor.core.activity.ActivityLog
>> >> >>>> ZooKeeper
>> >> >>>> down/not-serving waiting 30004 of 40000 ms before restarting
>> >> >>>> [ActivityQueue-0]
>> >> >>>>
>> >> >>>> And in my zookeeper logs, I am seeing these -
>> >> >>>>
>> >> >>>>     2015-02-09 00:11:19,355 [myid:] - INFO
>> >> >>>> [main:QuorumPeerConfig@103]
>> >> >>>> - Reading configuration from:
>> >> >>>> /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
>> >> >>>>     2015-02-09 00:11:19,365 [myid:] - INFO
>> >> >>>> [main:QuorumPeerConfig@340]
>> >> >>>> - Defaulting to majority quorums
>> >> >>>>     2015-02-09 00:11:19,368 [myid:1] - INFO
>> >> >>>> [main:DatadirCleanupManager@78] - autopurge.snapRetainCount
set to
>> >> >>>> 3
>> >> >>>>     2015-02-09 00:11:19,368 [myid:1] - INFO
>> >> >>>> [main:DatadirCleanupManager@79] - autopurge.purgeInterval
set to 0
>> >> >>>>     2015-02-09 00:11:19,369 [myid:1] - INFO
>> >> >>>> [main:DatadirCleanupManager@101] - Purge task is not scheduled.
>> >> >>>>     2015-02-09 00:11:19,379 [myid:1] - INFO
>> >> >>>> [main:QuorumPeerMain@127]
>> >> >>>> - Starting quorum peer
>> >> >>>>     2015-02-09 00:11:19,397 [myid:1] - INFO
>> >> >>>> [main:NIOServerCnxnFactory@94] - binding to port
>> >> >>>> 0.0.0.0/0.0.0.0:2181
>> >> >>>>     2015-02-09 00:11:19,414 [myid:1] - INFO  [main:QuorumPeer@959]
>> >> >>>> -
>> >> >>>> tickTime set to 2000
>> >> >>>>     2015-02-09 00:11:19,414 [myid:1] - INFO  [main:QuorumPeer@979]
>> >> >>>> -
>> >> >>>> minSessionTimeout set to -1
>> >> >>>>     2015-02-09 00:11:19,414 [myid:1] - INFO  [main:QuorumPeer@990]
>> >> >>>> -
>> >> >>>> maxSessionTimeout set to -1
>> >> >>>>     2015-02-09 00:11:19,414 [myid:1] - INFO
>> >> >>>> [main:QuorumPeer@1005] -
>> >> >>>> initLimit set to 10
>> >> >>>>     2015-02-09 00:11:19,431 [myid:1] - INFO
>> >> >>>> [Thread-1:QuorumCnxManager$Listener@504] - My election
bind port:
>> >> >>>> machineA.host.com/127.0.1.1:3888
>> >> >>>>     2015-02-09 00:11:19,440 [myid:1] - INFO
>> >> >>>> [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumPeer@714] - LOOKING
>> >> >>>>     2015-02-09 00:11:19,441 [myid:1] - INFO
>> >> >>>> [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@815]
- New
>> >> >>>> election. My id =  1, proposed zxid=0x0
>> >> >>>>     2015-02-09 00:11:19,443 [myid:1] - INFO
>> >> >>>> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification:
1
>> >> >>>> (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1
>> >> >>>> (n.round),
>> >> >>>> LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING
(my state)
>> >> >>>>     2015-02-09 00:11:19,445 [myid:1] - WARN
>> >> >>>> [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open
channel
>> >> >>>> to
>> >> >>>> 2
>> >> >>>> at election address machineB.host.com/10.52.81.211:3888
>> >> >>>>     java.net.ConnectException: Connection refused
>> >> >>>>             at java.net.PlainSocketImpl.socketConnect(Native
>> >> >>>> Method)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
>> >> >>>>             at
>> >> >>>> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
>> >> >>>>             at java.net.Socket.connect(Socket.java:546)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
>> >> >>>>             at java.lang.Thread.run(Thread.java:679)
>> >> >>>>     2015-02-09 00:11:19,449 [myid:1] - WARN
>> >> >>>> [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open
channel
>> >> >>>> to
>> >> >>>> 3
>> >> >>>> at election address machineC.host.com/10.57.78.941:3888
>> >> >>>>     java.net.ConnectException: Connection refused
>> >> >>>>             at java.net.PlainSocketImpl.socketConnect(Native
>> >> >>>> Method)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
>> >> >>>>             at
>> >> >>>> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
>> >> >>>>             at java.net.Socket.connect(Socket.java:546)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
>> >> >>>>             at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
>> >> >>>>             at java.lang.Thread.run(Thread.java:679)
>> >> >>>>     2015-02-09 00:11:19,450 [myid:1] - WARN
>> >> >>>> [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open
channel
>> >> >>>> to
>> >> >>>> 4
>> >> >>>> at election address machineD.host.com/10.59.576.12:3888
>> >> >>>>
>> >> >>>> I am running Exhibitor 1.5.3 and Zookeeper 3.4.6. Is there
>> >> >>>> anything
>> >> >>>> wrong I am doing? I have googled it for this ERROR and
I was not
>> >> >>>> able
>> >> >>>> to
>> >> >>>> find anything concrete. I have also verified that it is
able to
>> >> >>>> generate
>> >> >>>> myid successfully in each machine.
>> >> >>>>
>> >> >>>> Is this known issue? I have seen other people also having
same
>> >> >>>> issue
>> >> >>>> after I search on the google?
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>
>> >
>> >
>
>

Mime
View raw message