Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 24075 invoked from network); 16 Jun 2010 09:22:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Jun 2010 09:22:39 -0000 Received: (qmail 12833 invoked by uid 500); 16 Jun 2010 09:22:39 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 12571 invoked by uid 500); 16 Jun 2010 09:22:36 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 12563 invoked by uid 99); 16 Jun 2010 09:22:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jun 2010 09:22:35 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=10.0 tests=AWL,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.25] (HELO out1.smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jun 2010 09:22:29 +0000 Received: from compute1.internal (compute1.internal [10.202.2.41]) by gateway1.messagingengine.com (Postfix) with ESMTP id 9ACD5F8106 for ; Wed, 16 Jun 2010 05:21:58 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Wed, 16 Jun 2010 05:21:58 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=message-id:date:from:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; s=smtpout; bh=z600YYat/UF2Htj6vV0OJZVoJaU=; b=mmppik8peD2l+kKsPsVYMjMFiXgTpVsZwxNCY0JxmjuWoRaFOoLH8idlBjREXVLCYqEGiArem8tDgzyVa4tHqN2Vb1z5u+a5g4of+0NAdJY6iqx3g8RS4aU68clXm6htE0bWF1ExKWy7O+HayRxF1xCXHKAf8f3iK2ld6ZF7GkY= X-Sasl-enc: bgBMrhRpb1H2KzDFTdpUUA9+QAfaTFBZZXzWhOpC/RRk 1276680109 Received: from [192.168.123.210] (unknown [86.43.104.130]) by mail.messagingengine.com (Postfix) with ESMTPSA id B2DCB4CF82F for ; Wed, 16 Jun 2010 05:21:49 -0400 (EDT) Message-ID: <4C1897AC.9000805@boboco.ie> Date: Wed, 16 Jun 2010 10:21:48 +0100 From: Eric Bowman User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100608 Thunderbird/3.0.4 MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org Subject: Re: Debugging help for SessionExpiredException References: <78A3B8B0-DD3C-4DF6-8B28-E868E990A92B@proofpoint.com> <72EEDCBC-41FB-4C26-93EA-C0FCEB148CBA@proofpoint.com> <11E4AAC8-3241-473F-A659-28B5CFF48A9A@proofpoint.com> <4C17D4C1.4030100@apache.org> <15869C0E-962C-40ED-99EF-F919F1C9CB94@proofpoint.com> <4C18097E.3030806@apache.org> In-Reply-To: <4C18097E.3030806@apache.org> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Setting up a little process to run overnight that appends a timestamp to a file once per second or so can be a very effective tool for ruling out, for example, "extra-dimensional" VM effects. On 06/16/2010 12:15 AM, Patrick Hunt wrote: > I'm not very experienced personally with running zk on ec2 smalls, Ted > usually has the ec2 related insight. Given these boxes are not loaded > or lightly loaded, and you've ruled out gc/swap, the only thing I can > think of is that something is going on under the covers at the vm > level that's causing the high latency you're seeing. > > You're seeing 15 _minutes_ max latency. I can't think of what would > cause that inside zk. Any chance that the VM is shutting down or > "freezing" during that period? I dont' know. Are you monitoring that > system from a second system? Perhaps that might shed some light > (monitor the cpu/disk activity using some monitoring tool like > ganglia, nagios, etc... or even more primitive, perhaps doing a ping > to that system and tracking the round trip time/packet loss, dump to a > file and review the next day, etc...) > > Patrick > > On 06/15/2010 03:59 PM, Jordan Zimmerman wrote: >> They're small instances. The thing is that these machines are doing >> next to no work. We're just running simple little tests. The session >> expiration has not happened while I've been watching. It tends to >> happen over night. >> >> -JZ >> >> On Jun 15, 2010, at 1:50 PM, Ted Dunning wrote: >> >>> As usual, the ZK team provides the best feedback. >>> >>> I would be bold enough to ask what kind of ec2 instances you are >>> running on. Small instances are small chunks of larger machines >>> and are sometimes subject to competition for resources from the >>> other tenants. >>> >>> On Tue, Jun 15, 2010 at 12:30 PM, Patrick Hunt >>> wrote: 3) under-provisioned virtual machines (ie vmware) >>> >>> ... >>> >>> Given that you've ruled out the gc (most common), disk utilization >>> would be the next thing to check. >>> >> >> > -- Eric Bowman Boboco Ltd ebowman@boboco.ie http://www.boboco.ie/ebowman/pubkey.pgp +35318394189/+353872801532