Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 7116 invoked from network); 15 Jun 2010 17:34:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Jun 2010 17:34:11 -0000 Received: (qmail 95347 invoked by uid 500); 15 Jun 2010 17:34:11 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 95284 invoked by uid 500); 15 Jun 2010 17:34:10 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 95276 invoked by uid 99); 15 Jun 2010 17:34:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jun 2010 17:34:10 +0000 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jzimmerman@proofpoint.com designates 207.111.236.2 as permitted sender) Received: from [207.111.236.2] (HELO mx2.proofpoint.com) (207.111.236.2) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jun 2010 17:34:03 +0000 Received: from CUP-POSTAL1.corp.proofpoint.com (cup-sv10.corp.proofpoint.com [10.20.7.110]) by admin1009 (8.14.3/8.14.3) with ESMTP id o5FHXfSX023321 for ; Tue, 15 Jun 2010 10:33:41 -0700 Received: from hqmbm-w8081jg644.us.proofpoint.com ([10.23.10.99]) by CUP-POSTAL1.corp.proofpoint.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 15 Jun 2010 10:33:41 -0700 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1078) Subject: Re: Debugging help for SessionExpiredException From: Jordan Zimmerman In-Reply-To: <78A3B8B0-DD3C-4DF6-8B28-E868E990A92B@proofpoint.com> Date: Tue, 15 Jun 2010 10:33:41 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <72EEDCBC-41FB-4C26-93EA-C0FCEB148CBA@proofpoint.com> References: <78A3B8B0-DD3C-4DF6-8B28-E868E990A92B@proofpoint.com> To: zookeeper-user@hadoop.apache.org X-Mailer: Apple Mail (2.1078) X-OriginalArrivalTime: 15 Jun 2010 17:33:41.0863 (UTC) FILETIME=[E5C17770:01CB0CB0] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10090:4.4.10031,1.0.148,0.0.0000 definitions=2010-06-15_07:2010-06-15,2010-06-15,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=5.0.0-1005130000 definitions=main-1006150110 More on this... I ran last night with verbose GC on our client. I analyzed the GC log in = gchisto and 99% of the GCs are 1 or 2 ms. The longest gc is 30 ms. On = the Zookeeper server side, the longest gc is 130 ms. So, I submit, GC is = not the problem. NOTE we're running on Amazon EC2.=20 -JZ On Jun 9, 2010, at 11:36 AM, Jordan Zimmerman wrote: > We have a test system using Zookeeper. There is a single Zookeeper = server node and 4 clients. There is very little activity in this system. = After a day's testing we start to see SessionExpiredException on the = client. Things I've tried: >=20 > * Increasing the session timeout to 1 minute > * Making sure all JVMs are running in a 100MB partition >=20 > Any help debugging this problem would be appreciated. What kind of = diagnostics should can I add? Are there more config parameters that I = should try? >=20 > -JZ