Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA7FD105D4 for ; Wed, 11 Sep 2013 06:33:04 +0000 (UTC) Received: (qmail 83687 invoked by uid 500); 11 Sep 2013 06:33:03 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 83594 invoked by uid 500); 11 Sep 2013 06:33:03 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 83569 invoked by uid 99); 11 Sep 2013 06:33:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 06:33:02 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of strib@nicira.com designates 74.125.149.85 as permitted sender) Received: from [74.125.149.85] (HELO na3sys009aog136.obsmtp.com) (74.125.149.85) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 11 Sep 2013 06:32:54 +0000 Received: from mail-pb0-f43.google.com ([209.85.160.43]) (using TLSv1) by na3sys009aob136.postini.com ([74.125.148.12]) with SMTP ID DSNKUjAOgKr25froGHpi4jtEA7f6ehw1tRWr@postini.com; Tue, 10 Sep 2013 23:32:33 PDT Received: by mail-pb0-f43.google.com with SMTP id md4so8636938pbc.16 for ; Tue, 10 Sep 2013 23:32:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=Gna8PiL53UPHaFdmRyrwxYw3dN166aR/AhfsP3FGbEs=; b=bJ2XwTQHGlQ0eyvM+mZXHIC3M92FoGANRiqi3yCO2qcSVBFMO33tu6K8IxPhUdC37s YsRxRyPXtFa2CTu+TrySM368/4/bKzeGXE2fOhsOnluJ8YurxKKRdM3L9eSO+jSmYjj2 H/psXow2yvqNcTe8AVZPkGh/kVkk4apH3oGdUs7NS8k6UNNcb6PvhH5KCQcK4u9HuDvv mjGPwe7u1SHZGE35aVVDD9ihB2aVTU+ijUFvlGtGSaETsOQVYnE35H4oOrcfDfxVjyxF 5v6gn0e0cJLj/athMwq3qg6jMtFLgnYYBJ5cg2MZj56tQIQGAqnoRX1tN0ZK0E3K8jXt I7bw== X-Gm-Message-State: ALoCoQmbc1svAT1OIIrf4geydGOwoi1TAS2wf4TgEHByiuxykMcCVsRMwN7rbvJ8EcUN1zdotK0WK7wIO9uMtZUJSsplyqfJ3Wb23ZKzzuXLXD4M6pQ3Vt2x4EARy/Bxl44veG3ASspYqphlDBgGNsllaF0EPJMZyJB9tuHcedNo+J6hnROQ3zo= X-Received: by 10.68.229.2 with SMTP id sm2mr41551pbc.68.1378881152839; Tue, 10 Sep 2013 23:32:32 -0700 (PDT) X-Received: by 10.68.229.2 with SMTP id sm2mr41436pbc.68.1378881150968; Tue, 10 Sep 2013 23:32:30 -0700 (PDT) Received: from [192.168.1.100] (50-0-94-24.dsl.static.sonic.net. [50.0.94.24]) by mx.google.com with ESMTPSA id nv6sm27585465pbc.6.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 10 Sep 2013 23:32:29 -0700 (PDT) Message-ID: <52300E77.7080308@nicira.com> Date: Tue, 10 Sep 2013 23:32:23 -0700 From: Jeremy Stribling User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130803 Thunderbird/17.0.8 MIME-Version: 1.0 To: user@zookeeper.apache.org CC: German Blanco Subject: Re: adding a separate thread to detect network timeouts faster References: <522F7A9D.20800@nicira.com> <522F8264.5090606@nicira.com> <522F8993.4020003@nicira.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Germ�n, A very quick scan of that JIRA makes me think you're talking about server->server heartbeats, and not client->server heartbeats (which is what I'm talking about). I have not tested it explicitly or inspected that part of the code, but I've hit many cases in testing and production where client session expirations coincide with long fsync times as logged by the server. Jeremy On 09/10/2013 10:40 PM, German Blanco wrote: > Hello Jeremy and all, > > my idea was that the current implementation of ping handling already does > not wait on disk IO. > I am even working in a JIRA case that is related with this: > https://issues.apache.org/jira/browse/ZOOKEEPER-87 > And I have also made some tests that seem to confirm that ping handling is > done in a different thread than transaction handling. > But actually, I don't have any confirmation from any person in this > project. Are you sure that ping handling waits on IO for anything? Have you > tested it? > > Regards, > Germ�n Blanco. > > > > On Tue, Sep 10, 2013 at 11:05 PM, Jeremy Stribling wrote: > >> Good suggestion, thanks. At the very least, I think what we have in mind >> would be off by default, so users could only turn it on if they know they >> have relatively few clients and slow disks. An adaptive scheme would be >> even better, obviously. >> >> >> On 09/10/2013 02:04 PM, Ted Dunning wrote: >> >>> Perhaps you should be suggesting a design that is adaptive rather than >>> configured and guarantees low overhead at the cost of notification time in >>> extreme scenarios. >>> >>> For instance, the server can send no more than 1000 (or whatever number) >>> HB's per second and never more than one per second to any client. This >>> caps the cost nicely. >>> >>> >>> >>> On Tue, Sep 10, 2013 at 1:59 PM, Ted Dunning >> ted.dunning@gmail.com>**> wrote: >>> >>> >>> Since you are talking about client connection failure detection, >>> no, I don't think that there is a major barrier other than >>> actually implementing a reliable check. >>> >>> Keep in mind the cost. There are ZK installs with 100,000 >>> clients. If these are heartbeating every 2 seconds, you have >>> 50,000 packets per second hitting the quorum or 10,000 per server >>> if all connections are well balanced. >>> >>> If you only have 10 clients, the network burden is nominal. >>> >>> >>> >>> On Tue, Sep 10, 2013 at 1:34 PM, Jeremy Stribling >>> > wrote: >>> >>> I mostly agree, but let's assume that a ~5x speedup in >>> detecting those types of failures is considered significant >>> for some people. Are there technical reasons that would >>> prevent this idea from working? >>> >>> On 09/10/2013 01:31 PM, Ted Dunning wrote: >>> >>> I don't see the strong value here. A few failures would >>> be detected more >>> quickly, but I am not convinced that this would actually >>> improve >>> functionality significantly. >>> >>> >>> On Tue, Sep 10, 2013 at 1:01 PM, Jeremy Stribling >>> > wrote: >>> >>> Hi all, >>> >>> Let's assume that you wanted to deploy ZK in a >>> virtualized environment, >>> despite all of the known drawbacks. Assume we could >>> deploy it such that >>> the ZK servers were all using independent CPUs and >>> storage (though not >>> dedicated disks). Obviously, the shared disks (shared >>> with other, non-ZK >>> VMs on the same hypervisor) will cause ZK to hit the >>> default session >>> timeout occasionally, so you would need to raise the >>> existing session >>> timeout to something like 30 seconds. >>> >>> I'm curious if there would be any technical drawbacks >>> to adding an >>> additional heartbeat mechanism between the clients and >>> the servers, which >>> would have the goal of detecting network-only failures >>> faster than the >>> existing heartbeat mechanism. The idea is that there >>> would be a new thread >>> dedicated to processing these heartbeats, which would >>> not get blocked on >>> I/O. Then the clients could configure a second, >>> smaller timeout value, and >>> it would be assumed that any such timeout indicated a >>> real problem. The >>> existing mechanism would still be in place to catch >>> I/O-related errors. >>> >>> I understand the philosophy that there should be some >>> heartbeat mechanism >>> that takes the disk into account, but I'm having >>> trouble coming up with >>> technical reasons not to add a second mechanism. >>> Obviously, the advantage >>> would be that the clients could detect network >>> failures and system crashes >>> more quickly in an environment with slow disks, and >>> fail over to other >>> servers more quickly. The only disadvantages I can >>> come up with are: >>> >>> 1) More code complexity, and slightly more heartbeat >>> traffic on the wire >>> 2) I think the servers have to log session expirations >>> to disk, so if the >>> sessions expire at a faster rate than the disk can >>> handle, it might lead to >>> a large backlog. >>> >>> Are there other drawbacks I am missing? Would a patch >>> that added >>> something like this be considered, or is it dead from >>> the start? Thanks, >>> >>> Jeremy >>> >>> >>> >>> >>> >>>