Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71822EC70 for ; Mon, 25 Feb 2013 18:05:49 +0000 (UTC) Received: (qmail 77166 invoked by uid 500); 25 Feb 2013 18:05:48 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 77126 invoked by uid 500); 25 Feb 2013 18:05:48 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 77118 invoked by uid 99); 25 Feb 2013 18:05:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 18:05:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of briantarbox@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 18:05:42 +0000 Received: by mail-la0-f44.google.com with SMTP id eb20so3009873lab.17 for ; Mon, 25 Feb 2013 10:05:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=QyuKl7qoexPVyCMOrZ+azvoWa2B6b2Lq7tSGdCZDjsI=; b=uijBaFr07WI4k2aaMvFHxhT+WlKljUOU6qE/ZT1n8MneToqHj+5yRNVsMjw8xRxl9k lshnDffIFFkvG+Q0XWvTao4jOKtDZYrBkMUvJFUiqb9z7+pwynUNaez5KMD1PuTtAxoy HECoogzyyktGaQcwGp3szYqGuVkNQhyNTMad/ERrZ7mkT79Qfw9UEaO+lbLR2394MrV0 bcibDT8dWv4i3EKz9INYofOfwSXclSsqu513QQnoKBuqukXHdCwzz8TyizaHqFC9T8BN WNCteSrU1a96nMUnn1/kWQkLovWTFEUPWr97Ld2WLozT6adXbw2wCOnbrvtkhdOtY1EV E+Eg== MIME-Version: 1.0 X-Received: by 10.112.30.104 with SMTP id r8mr4851905lbh.82.1361815521084; Mon, 25 Feb 2013 10:05:21 -0800 (PST) Received: by 10.112.64.137 with HTTP; Mon, 25 Feb 2013 10:05:20 -0800 (PST) In-Reply-To: References: Date: Mon, 25 Feb 2013 13:05:20 -0500 Message-ID: Subject: Re: Server seems not to be sending keep-alives so I lose my session ("have not heard from server....") From: Brian Tarbox To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=f46d04016b3962585004d69062ff X-Virus-Checked: Checked by ClamAV on apache.org --f46d04016b3962585004d69062ff Content-Type: text/plain; charset=ISO-8859-1 The server logs don't say anything. I do have a theory based on reading the code, specifically the SendThread class within ClientCnxn.java It took me a while to figure that its the client that sends the ping due to the error message being "have not heard from the *server *..." Once I got past that the key line in the code is: int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() This basically means that the client will get at most 2 tries to send the ping within the timeout interval, no matter what you set the timeout value to. In a lossy network this may be insufficient...as can be seen from my client logs where I can go 30 seconds without sending a ping. I'm running a test now where I've changed the "2" to a "4". I trade a tiny increase in network traffic for a much higher chance of getting a successful ping even in a bad network environment. Brian On Mon, Feb 25, 2013 at 11:56 AM, Camille Fournier wrote: > What do your server logs say during this time? > > > On Mon, Feb 25, 2013 at 11:51 AM, Brian Tarbox >wrote: > > > I am getting the dreaded message: > > > > 10:59:45,871 INFO [org.apache.zookeeper.ClientCnxn] - > timed out, have not heard from server in 31482ms for sessionid > > 0x13d11dd08160007, closing socket connection and attempting reconnect> > > > > and from looking at the logs it certainly seems that the keep alive > > messages are sometime just not being sent. > > > > In my case I see a bunch of these: > > 10:58:00,164 DEBUG [org.apache.zookeeper.ClientCnxn] - > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:58:13,511 DEBUG [org.apache.zookeeper.ClientCnxn] - > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:58:26,857 DEBUG [org.apache.zookeeper.ClientCnxn] - > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:58:40,205 DEBUG [org.apache.zookeeper.ClientCnxn] - > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:59:14,140 DEBUG [org.apache.zookeeper.ClientCnxn] - > for sessionid: 0x13d11dd08160007 after 0ms> > > > > But then nothing from 10:59:14 until 10:59:45 when my client decides its > > been too long and so times out. > > > > I'm running 3.4.5 on EC2 ...any suggestions welcome. > > > > Thanks. > > > > Brian Tarbox > > -- > > http://about.me/BrianTarbox > > > -- http://about.me/BrianTarbox --f46d04016b3962585004d69062ff--