Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 28908 invoked from network); 28 Jun 2010 00:24:01 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Jun 2010 00:24:01 -0000 Received: (qmail 36015 invoked by uid 500); 28 Jun 2010 00:24:01 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 35956 invoked by uid 500); 28 Jun 2010 00:24:00 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 35948 invoked by uid 99); 28 Jun 2010 00:24:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jun 2010 00:24:00 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 28 Jun 2010 00:23:57 +0000 Received: (qmail 27728 invoked by uid 99); 28 Jun 2010 00:23:35 -0000 Received: from localhost.apache.org (HELO [192.168.1.126]) (127.0.0.1) (smtp-auth username phunt, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jun 2010 00:23:35 +0000 Message-ID: <4C27EB88.2020702@apache.org> Date: Sun, 27 Jun 2010 17:23:36 -0700 From: Patrick Hunt User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.1.10) Gecko/20100512 Lightning/1.0b1 Thunderbird/3.0.5 MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org CC: Peeyush Kumar Subject: Re: Receive timed out error while starting zookeeper server References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 06/26/2010 06:53 AM, Peeyush Kumar wrote: > I have a 6 node cluster (5 slaves and 1 master). I am trying to You typically want an odd number given that zk works by majority (even is fine, but not optimal). So 5 would be great (7 is a bit of overkill). 3 is fine too, but 5 allows for you to take 1 server down for "scheduled maintenance" and still experience an unexpected failure w/o impact to service availability. In your exception I see "DatagramSocket" this is unusual. What are you running for ZK version? As Lei suggested please include your config file so that we can review that as well (if you are overriding electionAlg this might be part of the problem. Current versions of ZK servers use tcp for connections by default, that's why this is unusual.) Most likely there is either a config problem or perhaps you have a firewall that's blocking communication btw the servers? Try verifying server to server connectivity on the ports you've selected. Patrick > start the zookeper server on the cluster. when I issue this command: > $ java -cp zookeeper.jar:lib/log4j-1.2.15.jar:conf \ > org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg > I get the following error: > 2010-06-26 18:09:17,468 - INFO [main:QuorumPeerConfig@80] - Reading > configuration from: conf/zoo.cfg > 2010-06-26 18:09:17,483 - INFO [main:QuorumPeerConfig@232] - Defaulting to > majority quorums > 2010-06-26 18:09:17,545 - INFO [main:QuorumPeerMain@118] - Starting quorum > peer > 2010-06-26 18:09:17,585 - INFO [QuorumPeer:/0.0.0.0:2179:QuorumPeer@514] - > LOOKING > 2010-06-26 18:09:17,589 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@154] > - Server address: master.cf.net/192.168.1.1:2180 > > 2010-06-26 18:09:17,589 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@154] > - Server address: slave01.cf.net/192.168.1.2:2180 > > 2010-06-26 18:09:17,792 - WARN [QuorumPeer:/0.0.0.0:2179:LeaderElection@194] > - Ignoring exception while looking for > leader > > java.net.SocketTimeoutException: Receive timed > out > at java.net.PlainDatagramSocketImpl.receive0(Native > Method) > at > java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) > > at > java.net.DatagramSocket.receive(DatagramSocket.java:725) > > at > org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) > > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) > > 2010-06-26 18:09:17,794 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@154] > - Server address: slave02.cf.net/192.168.1.3:2180 > > 2010-06-26 18:09:17,995 - WARN [QuorumPeer:/0.0.0.0:2179:LeaderElection@194] > - Ignoring exception while looking for > leader > > java.net.SocketTimeoutException: Receive timed > out > at java.net.PlainDatagramSocketImpl.receive0(Native > Method) > at > java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) > > at > java.net.DatagramSocket.receive(DatagramSocket.java:725) > > at > org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) > > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) > > 2010-06-26 18:09:17,996 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@154] > - Server address: slave03.cf.net/192.168.1.4:2180 > > 2010-06-26 18:09:18,197 - WARN [QuorumPeer:/0.0.0.0:2179:LeaderElection@194] > - Ignoring exception while looking for > leader > > java.net.SocketTimeoutException: Receive timed > out > at java.net.PlainDatagramSocketImpl.receive0(Native > Method) > at > java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) > > at > java.net.DatagramSocket.receive(DatagramSocket.java:725) > > at > org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) > > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) > > 2010-06-26 18:09:18,200 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@154] > - Server address: slave04.cf.net/192.168.1.5:2180 > > 2010-06-26 18:09:18,401 - WARN [QuorumPeer:/0.0.0.0:2179:LeaderElection@194] > - Ignoring exception while looking for > leader > > java.net.SocketTimeoutException: Receive timed > out > at java.net.PlainDatagramSocketImpl.receive0(Native > Method) > at > java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) > > at > java.net.DatagramSocket.receive(DatagramSocket.java:725) > > at > org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) > > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) > > 2010-06-26 18:09:18,402 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@154] > - Server address: slave05.cf.net/192.168.1.6:2180 > > 2010-06-26 18:09:18,604 - WARN [QuorumPeer:/0.0.0.0:2179:LeaderElection@194] > - Ignoring exception while looking for > leader > > java.net.SocketTimeoutException: Receive timed > out > at java.net.PlainDatagramSocketImpl.receive0(Native > Method) > at > java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) > > at > java.net.DatagramSocket.receive(DatagramSocket.java:725) > > at > org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) > > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) > > 2010-06-26 18:09:18,605 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@102] > - Election tally: > 2010-06-26 18:09:18,606 - INFO [QuorumPeer:/0.0.0.0:2179:LeaderElection@108] > - 1 -> 1 > > .....this error continues indefinitely.... > > can anyone please help me around this? > Your help is solicited > > Thanks > Peeyush >