From zookeeper-user-return-1218-apmail-hadoop-zookeeper-user-archive=hadoop.apache.org@hadoop.apache.org Mon Jan 25 18:32:46 2010 Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 35951 invoked from network); 25 Jan 2010 18:32:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jan 2010 18:32:46 -0000 Received: (qmail 65214 invoked by uid 500); 25 Jan 2010 18:32:46 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 65170 invoked by uid 500); 25 Jan 2010 18:32:46 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 65159 invoked by uid 99); 25 Jan 2010 18:32:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 18:32:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.210.194 as permitted sender) Received: from [209.85.210.194] (HELO mail-yx0-f194.google.com) (209.85.210.194) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 18:32:39 +0000 Received: by yxe32 with SMTP id 32so7505111yxe.5 for ; Mon, 25 Jan 2010 10:32:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=/mji51V9LDGQG9TuGUH0MYpkaYMx6DYiQ/h3iNcb7tc=; b=bk7g3jnXM1O/ugQxb7dRSKDXlOWz6FhZoq+FWbnBJ6ULkFLWKZEzh1dZKuCv2YwD2W 5i3odHlNxuY+35qzr9mDAkTvgLP5qVfNPuAoMogrReCzUmm5GmiWnt5L7MCSSYV4WRrM ycKdUbbGxOsWD4NNJLeo34ICRZmrVQHgqZNFo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=LT0sYwwCeCf7RZCxalIMs7/E3OkRCpp4VD38qCv4L7rstME4FVzmMgCKMVl4HmAF1p g4Dycve3fZHRNgfIFgNTS0uXcjG1py1Vb+uGhH6QqKeEd/k7XX3+2GPZYk+N71+uD1NS m4AuEue1IRsxak5LZhQAbkXQyjTZB9WmhHnQM= MIME-Version: 1.0 Sender: jdcryans@gmail.com Received: by 10.90.37.19 with SMTP id k19mr6006937agk.45.1264444338026; Mon, 25 Jan 2010 10:32:18 -0800 (PST) In-Reply-To: <80BB3F00-890B-46E8-89CB-A5706A2A0522@mailtrust.com> References: <1263333167.308920598@192.168.2.230> <4B4CFA57.2030008@apache.org> <1263337419.674422615@192.168.2.230> <1263339847.100820837@192.168.2.230> <4B4D1FFA.5050108@apache.org> <80BB3F00-890B-46E8-89CB-A5706A2A0522@mailtrust.com> Date: Mon, 25 Jan 2010 10:32:17 -0800 X-Google-Sender-Auth: e44f193be4eaae9d Message-ID: <31a243e71001251032v65d09474xf7aafb9a1213cece@mail.gmail.com> Subject: Re: Killing a zookeeper server From: Jean-Daniel Cryans To: zookeeper-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I believe we've just hit the same problem with zk-3.2.1 For some reason a machine crashed and it was part of our quorum of 5 servers. When we try to restart it it this does this (I replaced hostname and IP): 2010-01-25 10:25:06,469 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 1 at election address somehost1/someip1:3888 java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Q= uorumCnxManager.java:323) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(Q= uorumCnxManager.java:356) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLea= der(FastLeaderElection.java:603) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.jav= a:488) It has been like that for almost 20 minutes now, trying every other server in the quorum on different channels. ruok says imok but all other commands say that ZK server isn't running. I don't believe that 3.2.2 will help unless ZK-547 does more than it seems to. Any else I should look at? Thx! J-D On Wed, Jan 13, 2010 at 11:19 AM, Nick Bailey wrote: > So the solution for us was to just nuke zookeeper and restart everywhere. > =A0We will also be upgrading soon as well. > > To answer your question, yes I believe all the servers were running norma= lly > except for the fact that they were experiencing high CPU usage. =A0As we = began > to see some CPU alerts I started restarting some of the servers. > > It was then that we noticed that they were not actually running according= to > 'stat'. > > I still have the log from one server with a debug level and the rest with= a > warn level. If you would like to see any of these and analyze them just l= et > me know. > > Thanks for the help, > Nick Bailey >