Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 84201 invoked from network); 23 Feb 2010 02:45:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Feb 2010 02:45:25 -0000 Received: (qmail 2241 invoked by uid 500); 23 Feb 2010 02:45:25 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 2177 invoked by uid 500); 23 Feb 2010 02:45:25 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 2167 invoked by uid 99); 23 Feb 2010 02:45:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Feb 2010 02:45:25 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Feb 2010 02:45:14 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id o1N2hePM081213 for ; Mon, 22 Feb 2010 18:43:41 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:x-originalarrivaltime; b=OvEfvbpqh+m2c3KVIrfkBkNOv9UvjLZP4Mz7p3+w/NQcUHiS41UU0PLnWCGkJQZ8 Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 22 Feb 2010 18:43:41 -0800 Received: from 10.73.146.106 ([10.73.146.106]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.59]) with Microsoft Exchange Server HTTP-DAV ; Tue, 23 Feb 2010 02:43:40 +0000 User-Agent: Microsoft-Entourage/12.20.0.090605 Date: Mon, 22 Feb 2010 18:43:40 -0800 Subject: Re: Bit of help debugging a TIMED OUT session please From: Mahadev Konar To: Message-ID: Thread-Topic: Bit of help debugging a TIMED OUT session please Thread-Index: Acq0MgF/udcDQi8l8EWJPsFpZ/5UfA== In-Reply-To: <7c962aed1002221142q43a9387eo6574be49e1fbbaf3@mail.gmail.com> Mime-version: 1.0 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable X-OriginalArrivalTime: 23 Feb 2010 02:43:41.0046 (UTC) FILETIME=[021F7160:01CAB432] X-Virus-Checked: Checked by ClamAV on apache.org HI stack, the other interesting part is with the session: 0x26ed968d880001 Looks like it gets disconnected from one of the servers (TIMEOUT). DO you see any of these messages: "Attempting connection to server" in the logs before you see all the consecutive org.apache.zookeeper.ClientCnxn: Exception closing session 0x26ed968d880001 to sun.nio.ch.SelectionKeyImpl@788ab708 java.io.IOException: Read error rc =3D -1 java.nio.DirectByteBuffer[pos=3D0 lim=3D4 cap=3D4] at=20 org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) at=20 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) and.... >From the cient 0x26ed968d880001? Thanks mahadev On 2/22/10 11:42 AM, "Stack" wrote: > The thing that seems odd to me is that the connectivity complaints are > out of the zk client, right?, why is it failing getting to member 14 > and why not move to another ensemble member if issue w/ 14?, and if > there were a general connectivity issue, I'd think that the running > hbase cluster would be complaining at about the same time (its talking > to datanodes and masters at this time). >=20 > (Thanks for the input lads) >=20 > St.Ack >=20 >=20 > On Mon, Feb 22, 2010 at 11:26 AM, Mahadev Konar w= rote: >> I also looked at the logs. Ted might have a point. It does look like tha= t >> zookeeper server's are doing fine (though as ted mentions the skew is a >> little concerning, though that might be due to very few packets served b= y >> the first server). Other than that the latencies of 300 ms at max should= not >> cause any timeouts. >> Also, the number of packets received is pretty low - meaning that it was= n't >> serving huge traffic. Is there anyway we can check if the network connec= tion >> from the client to the server is not flaky? >>=20 >> Thanks >> mahadev >>=20 >>=20 >> On 2/22/10 10:40 AM, "Ted Dunning" wrote: >>=20 >>> Not sure this helps at all, but these times are remarkably asymmetrical= . =A0I >>> would expect members of a ZK =A0cluster to have very comparable times. >>>=20 >>> Additionally, 345 ms is nowhere near large enough to cause a session to >>> expire. =A0My take is that ZK doesn't think it caused the timeout. >>>=20 >>> On Mon, Feb 22, 2010 at 10:18 AM, Stack wrote: >>>=20 >>>> =A0 =A0 =A0 =A0Latency min/avg/max: 2/125/345 >>>> ... >>>> =A0 =A0 =A0 =A0Latency min/avg/max: 0/7/81 >>>> ... >>>> =A0 =A0 =A0 =A0Latency min/avg/max: 1/1/1 >>>>=20 >>>> Thanks for any pointers on how to debug. >>>>=20 >>=20 >>=20