Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 43629 invoked from network); 25 Jan 2010 22:17:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jan 2010 22:17:03 -0000 Received: (qmail 97376 invoked by uid 500); 25 Jan 2010 22:17:03 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 97336 invoked by uid 500); 25 Jan 2010 22:17:02 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 97322 invoked by uid 99); 25 Jan 2010 22:17:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 22:17:02 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 22:16:54 +0000 Received: from [10.73.135.251] (wifi-e-135-251.corp.yahoo.com [10.73.135.251]) by mrout1-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o0PMG84F083362; Mon, 25 Jan 2010 14:16:08 -0800 (PST) Message-ID: <4B5E182B.5050807@apache.org> Date: Mon, 25 Jan 2010 14:16:11 -0800 From: Patrick Hunt User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org CC: Patrick Hunt Subject: Re: Killing a zookeeper server References: <1263333167.308920598@192.168.2.230> <1263339847.100820837@192.168.2.230> <4B4D1FFA.5050108@apache.org> <80BB3F00-890B-46E8-89CB-A5706A2A0522@mailtrust.com> <31a243e71001251032v65d09474xf7aafb9a1213cece@mail.gmail.com> <4B5DE7A6.9040204@apache.org> <31a243e71001251156l4a1644bbkd120893de920ac66@mail.gmail.com> <4B5DFFE6.8080802@apache.org> <31a243e71001251248t1624e7a8ye1d98d0727b74f24@mail.gmail.com> <4B5E10CC.9030106@apache.org> <31a243e71001251405rab5b1b2jddc86b7bada466b4@mail.gmail.com> In-Reply-To: <31a243e71001251405rab5b1b2jddc86b7bada466b4@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org No worries. Kudos to Mahadev sniffing out the UDP in the netstat, I glossed right over it. ;-) Lots of good fixes in 3.2.2 vs pre-3.2. Still doesn't explain what Nick was seeing originally though... Patrick Jean-Daniel d wrote: > Oh my god! You are right, we run an old dev version of 3.2.0: > > zookeeper-r785019-hbase-1329.jar > > This was what we shipped HBase trunk with last summer... This quorum > has an uptime of more than 6 months! Well I guess that explains it, I > thought we restarted it since then during our HBase upgrades but it > seems not so I'm very sorry about this false alert. > > So... all I can say is thank you guys for such a reliable software! > We'll be upgrading to 3.2.2 really soon. > > J-D > > On Mon, Jan 25, 2010 at 1:44 PM, Patrick Hunt wrote: >> JD, there's something _very_ unusual in your setup. Are you running >> "official" released ZooKeeper code or something else? >> >> Either there is a misconfiguration on the other servers (the configs for the >> other servers is exactly the same as 222 right?), or perhaps some patches to >> ZK codebase that went awry? >> >> See the attached file "zk_ports.txt". This is a summary of the netstat -a >> that you sent. Notice in particular that UDP sockets are open for port 2888! >> This should not happen in the default ZK configuration case. >> >> By default we only use tcp connections between servers (quorum & election). >> There is a "electionAlg" option that allows users to turn off the TCP based >> fast leader election and go with a UDP based, but I don't see that in the >> config you provided for 222. (as I said, assuming you are not setting this >> option on the other servers either, correct?). >> >> >> Mahadev and I do remember that there was a bug in the 3.2 branch prior to >> 3.2 ever being released that caused us to use non-FLE (so UDP based) >> election by default, however we fixed that before 3.2.0 ever shipped (it was >> a bug in our config processing code) and it was never exposed in an official >> release. Perhaps you have picked up some code prior to that? >> >> Patrick >> >> Jean-Daniel Cryans wrote: >>>> According to the log for 222 it can't open a connection to the election >>>> port >>>> (3888) for any of the other servers. This seems very unusual. Can you >>>> verify >>>> that ther's connectivity on that port btw 222 and all the other servers? >>> jdcryans@sv4borg222:~$ telnet sv4borg224 3888 >>> Trying 10.10.20.224... >>> telnet: Unable to connect to remote host: Connection refused >>> jdcryans@sv4borg222:~$ telnet sv4borg224 2888 >>> Trying 10.10.20.224... >>> Connected to sv4borg224. >>> Escape character is '^]'. >>> >>>> Also, can you re-run the netstat with -a option? We can see the listen >>>> sockets that way (omitted by netstat by default). It would be great if >>>> you >>>> could send the netstat for all 5 servers. >>> I updated the tar.gz with the 5 netstat -anp >>> >>> Thx! >>> >>> J-D >>> >>>> Thanks, >>>> >>>> Patrick >>>> >>>> Jean-Daniel Cryans wrote: >>>>> Everything is here >>>>> http://people.apache.org/~jdcryans/zk_election_bug.tar.gz >>>>> >>>>> The server we are trying to start is sv4borg222 (myid is 2) and we >>>>> started it around 10:03:21 >>>>> >>>>> Thx! >>>>> >>>>> J-D >>>>> >> tcp6 0 0 10.10.20.221:34865 10.10.20.224:2888 >> ESTABLISHED 14682/java >> udp6 0 0 :::2888 :::* >> 14682/java >> >> >> tcp6 0 0 :::3888 :::* LISTEN >> 4092/java >> unix 2 [ ] STREAM CONNECTED 721588877 7642/java >> >> >> tcp6 0 0 10.10.20.223:42518 10.10.20.224:2888 >> ESTABLISHED 2704/java >> udp6 0 0 :::2888 :::* >> 2704/java >> >> >> tcp6 0 0 :::2888 :::* LISTEN >> 31052/java >> tcp6 0 0 10.10.20.224:2888 10.10.20.223:42518 >> ESTABLISHED 31052/java >> tcp6 0 0 10.10.20.224:2888 10.10.20.225:51459 >> ESTABLISHED 31052/java >> tcp6 0 0 10.10.20.224:2888 10.10.20.221:34865 >> ESTABLISHED 31052/java >> udp6 0 0 :::2888 :::* >> 31052/java >> >> >> tcp6 0 0 10.10.20.225:51459 10.10.20.224:2888 >> ESTABLISHED 19545/java >> udp6 0 0 :::2888 :::* >> 19545/java >> >>