Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C5F00200B9D for ; Thu, 13 Oct 2016 21:07:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C044C160AE4; Thu, 13 Oct 2016 19:07:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 10B48160AD2 for ; Thu, 13 Oct 2016 21:07:10 +0200 (CEST) Received: (qmail 94500 invoked by uid 500); 13 Oct 2016 19:07:10 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 94484 invoked by uid 99); 13 Oct 2016 19:07:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2016 19:07:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1361A18068A for ; Thu, 13 Oct 2016 19:07:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id HKNOtnaUPkXt for ; Thu, 13 Oct 2016 19:07:06 +0000 (UTC) Received: from mail-qk0-f172.google.com (mail-qk0-f172.google.com [209.85.220.172]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D89AF5F647 for ; Thu, 13 Oct 2016 19:07:05 +0000 (UTC) Received: by mail-qk0-f172.google.com with SMTP id o68so154803935qkf.3 for ; Thu, 13 Oct 2016 12:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=vuoflPNd7ODsnBJbZwbGE0jef00Tgl5PbTo725Qlpak=; b=AoSpDPTJ85lxqgxMfq471BJXEIdleJJdH7FXam/vCstRXgSeoEuQxbDOZfENOcEok3 A/yzHDbqJc34h09Ohs2+UK1A65Dn396NsUfhbnFqIxEaALpphh4M00It7clI+Vk7Al5t XXelkoROpYyPOnsZLBJtdTMWmaYhJqumxoWPTRop21/3Li0ZyEQJflPZCekBnu4FG3vp N8qBpXh4aHA/M4+LB+Rx9WoftQ2nklFqZ6M+wBK94ESlzX4n/cRsrlE/Ec+Ynxizh+VA XCr9wlHntC2UXgbJFW9+mAmptfyYBOv7Yd94LOl1GV12d4UNS6l2zc3ayjB0f4HIMQmY j5Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=vuoflPNd7ODsnBJbZwbGE0jef00Tgl5PbTo725Qlpak=; b=FV3wokO7hBS04Jjsvh4hjCc+38ZZD3JhgrgbFx63SGvMhTWX10Awk85xN9QXv1BDs8 d2WBGv3VPhlLPTkhDa0Pbuo1mbIx6NFnBbUeNOohb0hqNiWkJKDyELhAAFEyj8Q7cLRA rM/dRXJef6hY4/W5sDjH8owN3Zdn5MBlSHzImqGrDgbEIIpdwZeIxGIc4fQRzbynyxX5 vIs/MyOCjUZM6/I9i9O3Vkoom0w3KKGMvce1WxmNa+5koTg2owMsOKuYEc665+NXem54 OF/e74KtA6EF6b3jElseuuIjZEJZr8SBO9KRYzZ/6lCjCUN620BHeEpTKnqFfm2oBwDe YfUA== X-Gm-Message-State: AA6/9RlioqTonyGSHuJUJtuIDHSfrKHbxW4uEG7EKLzuapigMC8zv5IbGcuNN5dYLnYKrdyjLPZ4TTIvyDYQAQ== X-Received: by 10.55.110.130 with SMTP id j124mr7347185qkc.111.1476385600704; Thu, 13 Oct 2016 12:06:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.129.137 with HTTP; Thu, 13 Oct 2016 12:06:40 -0700 (PDT) In-Reply-To: References: From: Arshad Mohammad Date: Fri, 14 Oct 2016 00:36:40 +0530 Message-ID: Subject: Re: outstandingChanges queue grows without bound To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=94eb2c05dd084816b4053ec3cf7d archived-at: Thu, 13 Oct 2016 19:07:12 -0000 --94eb2c05dd084816b4053ec3cf7d Content-Type: text/plain; charset=UTF-8 Hi Mike I also faced same issue. There is test patch in ZOOKEEPER-2570 which can be used to quickly check performance gains in each modification. Hope it is useful. -Arshad On Thu, Oct 13, 2016 at 1:27 AM, Mike Solomon wrote: > I've been performance testing 3.5.2 and hit an interesting unavailability > issue. > > When there server is very busy (64k connections, 16k writes per > second) the leader can get busy enough that connections get throttled. > Enough throttling causes sessions to expire. As sessions expire, the > CPU consumption rises and the quorum is effectively unavailable. > Interestingly, if you shut down all the clients, the quorum won't heal > for nearly 10 minutes. > > The issue is that the outstandingChanges queue has 250k items in it > and the closeSession code scans this linearly under a lock. Replacing > the linear scan with a hash table lookup improves this, but likely the > real solution is some backpressure on clients as a result of an > oversized outstandingChanges queue. > > Here is a sample fix: > https://github.com/msolo/zookeeper/commit/75da352d506c2e3b0001d28acc058c > 422b3c8f0c > > This results in the quorum healing about 30 seconds after the clients > disconnect. > > Is there a way to prevent runaway growth in this queue? I'm wondering > if changing the definition of "throttling" to take into account the > size of this queue might help mitigate this. The end goal is that some > stable amount of traffic is reached asymptotically without suffering a > collapse. > > Thanks, > -Mike > --94eb2c05dd084816b4053ec3cf7d--