Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 055DB4134 for ; Fri, 1 Jul 2011 15:07:13 +0000 (UTC) Received: (qmail 86949 invoked by uid 500); 1 Jul 2011 15:07:12 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 86869 invoked by uid 500); 1 Jul 2011 15:07:12 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 86858 invoked by uid 99); 1 Jul 2011 15:07:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2011 15:07:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jared.cantwell@gmail.com designates 209.85.160.170 as permitted sender) Received: from [209.85.160.170] (HELO mail-gy0-f170.google.com) (209.85.160.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2011 15:07:05 +0000 Received: by gyb13 with SMTP id 13so1620437gyb.15 for ; Fri, 01 Jul 2011 08:06:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=3wYseqg/u1RBvu6Y/tIbmewWDfUMNmYHyy5knV0J2sc=; b=ko3UKVryoRxXk6SQOmHosuMITso/1P7SfoAav8O/41R103yqHkKK+E6PshqdHTuujW b5S4kzKQQaa71xUZ1Eada1/ZdeIXXC7u8mMm5ALHYh3XW8zYgAKpUfOPmlmQfeDPecXR oHGyEJeY915qoi0JHXUfpRYPNDCci3w36wq8Y= Received: by 10.91.3.31 with SMTP id f31mr3070357agi.73.1309532803091; Fri, 01 Jul 2011 08:06:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.90.87.7 with HTTP; Fri, 1 Jul 2011 08:06:23 -0700 (PDT) In-Reply-To: References: <15121695A88D46F89D330AC7D0F54B75@china.huawei.com> <69D3016305F9084FBD2C4A0DF189BD5C17689336FF@GSCMAMP02EX.firmwide.corp.gs.com> <929B0DD7-9EA6-499C-A14E-A2346DC089CF@me.com> From: Jared Cantwell Date: Fri, 1 Jul 2011 09:06:23 -0600 Message-ID: Subject: Re: Serious problem processing hearbeat on login stampede To: user@zookeeper.apache.org Cc: Chang Song Content-Type: multipart/alternative; boundary=0016363b8e8c8ccc5e04a7035d9e X-Virus-Checked: Checked by ClamAV on apache.org --0016363b8e8c8ccc5e04a7035d9e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable As a note, I believe we just used this patch to solve a major issue we were seeing. We were having problems when power to a node was pulled, and thus hung tcp sessions on the servers. With many connections, each close operation was taking 2 seconds and held up the server significantly enough to start incorrectly closing other sessions. By disabling linger, these hanging sessions were closed immediately and the problem went away. Thanks Chang! ~Jared On Tue, Apr 19, 2011 at 10:59 AM, Ted Dunning wrote= : > Where is this set? > > Why does this cause this problem? > > 2011/4/19 Chang Song > > > > > Problem solved. > > it was socket linger option set to 2 sec timeout. > > > > We have verified that the original problem goes away when we turn off > > linger option. > > No longer a mystery ;) > > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-1049 > > > > > > Chang > > > > > > 2011. 4. 19., =EC=98=A4=EC=A0=84 3:16, Mahadev Konar =EC=9E=91=EC=84=B1= : > > > > > Camille, Ted, > > > Can we continue the discussion on > > > https://issues.apache.org/jira/browse/ZOOKEEPER-1049? > > > > > > We should track all the suggestions/issues on the jira. > > > > > > thanks > > > mahadev > > > > > > On Mon, Apr 18, 2011 at 9:03 AM, Ted Dunning > > wrote: > > >> Interesting. It does seem to suggestion the session expiration is > > >> expensive. > > >> > > >> There is a concurrent table in guava that provides very good > > multi-threaded > > >> performance. I think that is achieved by using a number of locks an= d > > then > > >> distributing threads across the locks according to the hash slot bei= ng > > used. > > >> But I would have expected any in memory operation to complete very > > quickly. > > >> > > >> Is it possible that the locks on the session table are held longer > than > > they > > >> should be? > > >> > > >> 2011/4/18 Fournier, Camille F. [Tech] > > >> > > >>> Is it possible this is related to this report back in February? > > >>> > > >>> > > > http://mail-archives.apache.org/mod_mbox/zookeeper-user/201102.mbox/%3C66= 42FC1CAF133548AA8FDF497C547F0A23C0C5265B@NYWEXMBX2126.msad.ms.com%3E > > >>> > > >>> I theorized that the issue might be due to synchronization on the > > session > > >>> table, but never got enough information to finish the investigation= . > > >>> > > >> > > > > > > > > > > > > -- > > > thanks > > > mahadev > > > @mahadevkonar > > > > > --0016363b8e8c8ccc5e04a7035d9e--