Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D45991146D for ; Fri, 20 Jun 2014 02:10:23 +0000 (UTC) Received: (qmail 72473 invoked by uid 500); 20 Jun 2014 02:10:22 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 72423 invoked by uid 500); 20 Jun 2014 02:10:22 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 72411 invoked by uid 99); 20 Jun 2014 02:10:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 02:10:22 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tivv00@gmail.com designates 209.85.128.181 as permitted sender) Received: from [209.85.128.181] (HELO mail-ve0-f181.google.com) (209.85.128.181) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 02:10:19 +0000 Received: by mail-ve0-f181.google.com with SMTP id db11so3108457veb.12 for ; Thu, 19 Jun 2014 19:09:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=cBJ5QFH8/1FLlZ6rf2v0wszSc99sLOugK2grSEtB4rc=; b=M9mDjaHBHgy8I+fiNLMgNbYV3s/dMSkwDAUW0nbugTDocpcBU6OhUJZZ2eQZpXORmR UhWx6Yn0kqqOeJ3BgJ9rHUFknA6fFSY8QdW8JZ5t5BPl6kXQRZhrbqDghkquLYOJhaMU UCtr00WqUWGvWqRHHltrhrmvbCE5Z7h0oG0TIFOIEJtX3aQIGxuPww42YNLWKUA1dqaP 8BLw8rIU8CF/2XqAJAptgGjxSJVkmmI8obzmgndR6j7KxwuhQ846qvS15br+LthFLcpk CJBspiesWM5iL0aGqBNLwj3agSOlg3gbArGg37qCbFzgtkRlFzsHW6wr90jLWiffO7v2 nwRg== MIME-Version: 1.0 X-Received: by 10.52.184.164 with SMTP id ev4mr205766vdc.15.1403230195318; Thu, 19 Jun 2014 19:09:55 -0700 (PDT) Sender: tivv00@gmail.com Received: by 10.220.132.129 with HTTP; Thu, 19 Jun 2014 19:09:55 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 Jun 2014 22:09:55 -0400 X-Google-Sender-Auth: 0VpDYjdjIzKeNhOBgy-uqTS0p-w Message-ID: Subject: Re: High CPU usage on zookeeper clients when cluster is down From: Vitalii Tymchyshyn To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=bcaec54865d854899b04fc3afd3a X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54865d854899b04fc3afd3a Content-Type: text/plain; charset=UTF-8 I'd say that some randomness added here would help. E.g. to use 700-1300 ms instead of hard coded one second. 2014-06-19 18:14 GMT-04:00 Luke Stephenson : > Hello, > > I'm running approximately 20 java processes on one host. Each process > connects to zookeeper, but places very little load on zookeeper. The > zookeeper cluster consists of 9 nodes. > > When the zookeeper cluster is healthy, all is well. However when the > zookeeper cluster goes down, the clients create significant load on the > host as they attempt to reconnect to zookeeper. > > Each zookeeper client attempts to connect to each of the 9 nodes listed in > the zookeeper cluster, in succession. If the connection fails to all hosts > it will wait 1 second before trying again. So every second I've got 180 > attempted connections on one host. I already had a problem with the > zookeeper cluster being down, now the clients are creating excessive load > as well compounding the issue. > > This is the code which I've narrowed it down to. Unfortunately the 1 > second delay between attempts is hard coded. > > https://github.com/apache/zookeeper/blob/release-3.4.6/src/java/main/org/apache/zookeeper/ClientCnxn.java#L940 > private void startConnect() throws IOException { > state = States.CONNECTING; > > InetSocketAddress addr; > if (rwServerAddress != null) { > addr = rwServerAddress; > rwServerAddress = null; > } else { > addr = hostProvider.next(1000); > } > > Is the typical pattern to use a load balancer so that the client only > specifies one endpoint and as a result only attempts to establish 1 > connection per second? Any other recommendations? > > I would have thought this was a common problem, but my searches failed to > find existing discussions on it. > > Thanks > > Luke > > PS Apologies if you have received this twice. I initially published from > nabble which appears to have failed. > --bcaec54865d854899b04fc3afd3a--