Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A1E311D8B for ; Fri, 4 Jul 2014 13:37:03 +0000 (UTC) Received: (qmail 24206 invoked by uid 500); 4 Jul 2014 13:37:02 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 24157 invoked by uid 500); 4 Jul 2014 13:37:02 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 24145 invoked by uid 99); 4 Jul 2014 13:37:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jul 2014 13:37:01 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of james_mulcahy@apple.com designates 17.72.148.12 as permitted sender) Received: from [17.72.148.12] (HELO mail-in2.euro.apple.com) (17.72.148.12) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jul 2014 13:36:56 +0000 DKIM-Signature: v=1; a=rsa-sha256; d=apple.com; s=mailout2048s; c=relaxed/simple; q=dns/txt; i=@apple.com; t=1404480992; x=2268394592; h=From:Sender:Reply-To:Subject:Date:Message-id:To:Cc:MIME-version:Content-type: Content-transfer-encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-reply-to:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=aacVubv9U8t2Q563AxVyc2vaVYZLdJ1ORvq03BdWMpc=; b=sxDlu+StaNtMCzhm21MCa/yY6QS7zeV3IQcHWFGcRdrOL+0s5G4quz0Bt5iwjlF+ vux5CojeVJEpz3QEtkFp+XWSoJ3+mE6evRDJYc2LpdHQKD4Cegw9l47ZP1TaXBgr PNdX9GKoWNCpT4vJv1eWHWaCoIOKYWlWBJm/0vqiLUggamksvT/fVx0Wya2HW1mO 3QsFH7j6YggQzd5tL0BJFYKvTW1a9PDkJxCB4oS9VKGzW1j8YMlRRHb97ff7vtMz JmVpdBXS1qPfzO1Cv7a+f1eQoTN6A4CXkeDjzjXs++SGYF4BStisJmFBemBCpMaD wcNHnFiF2ZiJMUahG4Dz1A==; Received: from relay2.euro.apple.com (relay2.euro.apple.com [17.66.55.12]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mail-in2.euro.apple.com (Symantec Mail Security) with SMTP id FE.24.29049.0EDA6B35; Fri, 4 Jul 2014 14:36:32 +0100 (BST) X-AuditID: 1148940c-f79176d000007179-0c-53b6ade0ea2e Received: from crk-mmpp-sz02 ( [17.66.12.155]) (using TLS with cipher RC4-MD5 (128/128 bits)) (Client did not present a certificate) by relay2.euro.apple.com (Symantec Mail Security) with SMTP id B9.0C.07310.0EDA6B35; Fri, 4 Jul 2014 14:36:32 +0100 (BST) Received: from nlams2-asavpn-l2tp-17-78-245-87.euro.apple.com ([17.78.245.87]) by crk-mmpp-sz02.euro.apple.com (Oracle Communications Messaging Server 7.0.5.30.0 64bit (built Oct 22 2013)) with ESMTPSA id <0N8600B1LWGUDY00@crk-mmpp-sz02.euro.apple.com> for user@zookeeper.apache.org; Fri, 04 Jul 2014 14:36:32 +0100 (IST) Content-type: text/plain; charset=windows-1252 MIME-version: 1.0 (Mac OS X Mail 7.3 \(1878.2\)) Subject: Re: Unbalanced client connections From: James Mulcahy In-reply-to: <1404479783.33959.YahooMailNeo@web142305.mail.bf1.yahoo.com> Date: Fri, 04 Jul 2014 14:36:26 +0100 Content-transfer-encoding: quoted-printable Message-id: References: <8801B9CA-6A0A-4249-BB37-A79C4534FAB8@apple.com> <1404478660.75055.YahooMailNeo@web142301.mail.bf1.yahoo.com> <2BDA69A7-AF21-4EDD-B2EE-E1258C471645@apple.com> <1404479783.33959.YahooMailNeo@web142305.mail.bf1.yahoo.com> To: user@zookeeper.apache.org, Flavio Junqueira X-Mailer: Apple Mail (2.1878.2) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrKLMWRmVeSWpSXmKPExsUi6GTOo/tg7bZgg90dfBY/l65ncWD0OLSw kyWAMYrLJiU1J7MstUjfLoEro+3dFvaCTsmK7f2LmRoYPwh3MXJySAiYSDQ1vGSFsMUkLtxb z9bFyMUhJLCMSWLL/3fsXYwcYEXTv+lBxFuZJN797GWCcE4ySUxtXcwGUsQsoCdx/6IWyCBe AQOJNwePM4HYwgJaEjM//GQHsdkEdCX2n5kMtoxTwFNi1vI/bCA2i4CqxIXVa5lBbGYBbYkn 7y6wQsyxkZg9dxXUrreMEi0XpoElRATcJU5ub2ODuFpeYkb7CXaQIgmBt6wSM/7PY5zAKDQL 4aZZSG6ahWTHAkbmVYziuYmZObqZeUZ6qaVF+XqJBQU5qXrJ+bmbGEGB6zGFZwfjxYOGhxgF OBiVeHj7Fm4LFmJNLCuuzD3EKMHBrCTC+2cVUIg3JbGyKrUoP76oNCe1+BCjNAeLkjjvtg1t wUIC6YklqdmpqQWpRTBZJg5OqQbG6AmVP6N+MK+WmtO3pNhHriW9T6uo6/pDt9RyH/5aba8l J8LWPxGcxrDt6kbeOLMGX1GXkx1qbw/kz9eoSwo+sNpJf/YBbUvLBa4xe4IVOht9ZAQ+3SlY 29LTNIvXZwk7V8/78Fl7mdXrdRcs/8rL8OqXyWbtcyZiZ9uL/4e+zRLOZShZ6qjEUpyRaKjF XFScCAA8V7rbWAIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprLLMWRmVeSWpSXmKPExsUi6MQzW/fB2m3BBmv/cFv8XLqexYHR49DC TpYAxigum5TUnMyy1CJ9uwSujLZ3W9gLOiUrtvcvZmpg/CDcxcjBISFgIjH9m14XIyeQKSZx 4d56ti5GLg4hgVYmiXc/e5kgnJNMElNbF7OBNDAL6Encv6gF0sArYCDx5uBxJhBbWEBLYuaH n+wgNpuArsT+M5NZQWxOAU+JWcv/sIHYLAKqEhdWr2UGsZkFtCWevLvACjHHRmL23FVQu94y SrRcmAaWEBFwlzi5vY0N4jp5iRntJ9gnMPLPQjhjFpIzZiEZu4CReRWjaFFqTmKlkV5qaVG+ XmJBQU6qXnJ+7iZGcKCZ8+xgfHXQ8BCjAAejEg/v1zXbgoVYE8uKK3MPMUpwMCuJ8P5ZBRTi TUmsrEotyo8vKs1JLT7EKM3BoiTO+7y9P1hIID2xJDU7NbUgtQgmy8TBKdXAyDzT0+Fey/JT R3eu9CltCi+ZcvGpHluwYI/4lqKpXTebJv8+L+D5e8K+GU2T69jKCxyerJ55yme+t7xJouGz zV5T5qeyZu7c9PxCWtcMIZbJT7Yf52YtNbFxDDjy3HlalWVJj/rNXz/WWvizFr7UW6cQXGj+ PHLNpbkOAmvnWYpO+XPjhN/WZUosxRmJhlrMRcWJABiTqmQwAgAA X-Virus-Checked: Checked by ClamAV on apache.org On 4 Jul 2014, at 14:16, Flavio Junqueira = wrote: > Ok, so a couple of obvious checks Sure=85 > - Are you passing a connection string with all five servers? Yes, most definitely. Prior to deployment of this I did some extensive = testing where I killed off ZK servers randomly to test our clients=92 = ability to reconnect on to another server in the cluster. I know that = if they absolutely need to, they can connect elsewhere =97 but the = graphs show they almost always pick the same server. > - Are you calling zoo_deterministic_conn_order(1) by any chance (you = shouldn't if you want shuffling)? No, I wasn=92t aware of that function =97 but in mentioning it you=92ve = led me to the code that does the shuffling. Is there anything on the = server side to force a client to move elsewhere if the server has a = disproportional number of the clients connected to it? That=92s the = function I though I had read exists? That said, given a sufficiently = random random() function, it looks like the permute should do enough to = stop all clients arriving on the same server initially anyway. Perhaps I=92ll need to add some instrumentation dump out the permuted = connection list and see how it varies across the clients? =97James > -Flavio >=20 >=20 > On Friday, July 4, 2014 2:01 PM, James Mulcahy = wrote: >=20 >=20 >>=20 >>=20 >>=20 >> Hi Flavio, >>=20 >> Thanks for the quick response =97 and apologies for not including = these details up front! >>=20 >> - C client binding >> - 99.99% MacOS X Clients (10.9.2), with a couple of Linux Clients = (Ubuntu 14.04) >> - All ZK nodes are Linux (Ubuntu 14.404) >> - ZooKeeper 3.4.6 >>=20 >> No Windows involved here=85. >>=20 >> =97James >>=20 >>=20 >> On 4 Jul 2014, at 13:57, Flavio Junqueira = wrote: >>=20 >>> Hi James, >>>=20 >>> Are you using the C or the Java client binding? What's the OS? I'm = asking because there is an issue with the randomization of the connect = string on Windows we found, but I haven't created a jira for it yet. >>>=20 >>> -Flavio=20 >>>=20 >>>=20 >>> On Friday, July 4, 2014 10:41 AM, James Mulcahy = wrote: >>>=20 >>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> Hello, >>>>=20 >>>> I run a 5 node ZooKeeper ensemble, with ~900 clients connected at a = given time. I=92m noticing that at any one point in time, all the = clients are generally connected to the same ZooKeeper node. >>>>=20 >>>> Looking back over the graphs I have which track this, there has = only been one brief period where one node didn=92t have >90% of the = clients; and during that period, two nodes shared roughly 50% of the = clients each. >>>>=20 >>>> Is this expected behaviour? Is there anything I can do to tune = this, to encourage the clients to be more balanced? >>>>=20 >>>> My expectation was that the clients would self-balance =97 I = thought I=92d read that somewhere in the documentation, but I can=92t = find a reference for that now. >>>>=20 >>>> Thanks in advance, >>>>=20 >>>> =97James >>>>=20 >>=20 >>=20