Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C869E8EF for ; Fri, 8 Mar 2013 15:45:06 +0000 (UTC) Received: (qmail 35062 invoked by uid 500); 8 Mar 2013 15:45:06 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 34617 invoked by uid 500); 8 Mar 2013 15:45:04 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 34587 invoked by uid 99); 8 Mar 2013 15:45:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 15:45:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.173] (HELO mail-qc0-f173.google.com) (209.85.216.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 15:44:56 +0000 Received: by mail-qc0-f173.google.com with SMTP id b12so596809qca.18 for ; Fri, 08 Mar 2013 07:44:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=dOteGY3PZ6M9yPkKrOoZ4PLQF8g6AXnrLgd0AlxGW3Y=; b=TGQv7zceFduYTaq67uN2xnFD2r1EFP9oqod8ozL0dlPTL6DYPzp+nGo+MoMaYmu8A2 k1LMpW0rm7qrFi0faAaM7TxVRBgw3kjVXCjnlWg9ZcvCtb1MwBwP6rCCXb09GTUKS1iu //bErcSYX5BPf+osCvEkMiYdEmpxJS8vfum+Izcpln2iLMClp73NBfTvt1ot49s0N1Xi 0LrWy7JhEbjewR7q+1IwqwWUVQwmQn2pL1UdkntHhwSNkOeWP/jkTLlc/iqO4fV/2Bkv HVkzSkvFrTLAApH7HLzWt8BC1MoogTo0a9Tny48iitCMmvqrvSKjjd/ACdro1lVifUee 9j2A== MIME-Version: 1.0 X-Received: by 10.224.216.135 with SMTP id hi7mr4540554qab.28.1362757470721; Fri, 08 Mar 2013 07:44:30 -0800 (PST) Received: by 10.49.127.176 with HTTP; Fri, 8 Mar 2013 07:44:30 -0800 (PST) Date: Fri, 8 Mar 2013 10:44:30 -0500 Message-ID: Subject: Connections issues with ZooKeeper From: Eric Robert To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=20cf300fb05ff54db404d76bb287 X-Gm-Message-State: ALoCoQmkksoOegtzWHwH6qzSemt89ZBCFp3arI6XlNPrfVyQ7hO2MBeFsaE1NMyCnVZl0Aee/GMe X-Virus-Checked: Checked by ClamAV on apache.org --20cf300fb05ff54db404d76bb287 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello, I am experiencing connection issues when many process try to connect to ZK at the same time. I quickly found that I needed to increased maxClientCnxns to cover our use case but I still get timeout when connecting. Most of the client run from the same machine. I've tried different setup with similar results i.e. standalone on the same machine as the clients, standalone on another machine or in an ensemble of 5 servers. For example, when ZK is local and standalone, I start to get a few timeouts with 100 clients and the problem gets a lot worse from there i.e. with something like 1000 clients, most of them can't connect. I think I reproduced the problem with zk-smoketest with a simple script that starts multiple instances of the test concurrently. Note that with an ensemble of 5 servers, I start to get exceptions with very few connections. Maybe we're missing something in the configuration? Here it is: tickTime=3D2000 initLimit=3D10 syncLimit=3D5 dataDir=3D/tmp/zookeeper clientPort=3D2181 maxClientCnxns=3D4000 server.1=3D69.90.81.244:2888:3888 server.2=3D69.90.81.246:2888:3888 server.3=3D69.90.81.248:2888:3888 server.4=3D69.90.81.250:2888:3888 server.5=3D69.90.81.252:2888:3888 Here is my test: #!/bin/bash for i in {1..10} do ./zk-latencies.py --root_znode=3D/zoo-$i --znode_count=3D10 --servers=3D= " ag4.recoset.com:2181" & echo "running $i" #waiting makes everything good again #sleep 1 done Here is one of the exception I get: Traceback (most recent call last): File "./zk-latencies.py", line 304, in asynchronous_latency_test(s, data) File "./zk-latencies.py", line 188, in asynchronous_latency_test timer2(func, "get %7d znodes " % (options.znode_count)) File "./zk-latencies.py", line 85, in timer2 func() File "./zk-latencies.py", line 183, in func cb.waitForSuccess() File "/home/eric/code/zk-smoketest/zkclient.py", line 181, in waitForSuccess (self.handle, self.rc)) zkclient.ZKClientError: 'asynchronous operation failed on handle 0 with rc -4' For reference, I seems to get good performance with 1 connection and 10000 nodes: Connected in 189 ms, handle is 0 Testing latencies on server ag4.recoset.com:2181 using asynchronous calls created 10000 permanent znodes in 2155 ms (0.215594 ms/op 4638.358658/sec) set 10000 znodes in 1027 ms (0.102703 ms/op 9736.823045/sec) get 10000 znodes in 1096 ms (0.109671 ms/op 9118.163178/sec) deleted 10000 permanent znodes in 1574 ms (0.157465 ms/op 6350.621245/sec) created 10000 ephemeral znodes in 1776 ms (0.177664 ms/op 5628.592681/sec) watched 10000 znodes in 1282 ms (0.128248 ms/op 7797.367901/sec) deleted 10000 ephemeral znodes in 1006 ms (0.100612 ms/op 9939.141978/sec) notif 10000 watches in 0 ms (included in prior) Latency test complete Thanks! =C9ric --20cf300fb05ff54db404d76bb287--