Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 7149 invoked from network); 3 May 2010 17:01:04 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 17:01:04 -0000 Received: (qmail 98465 invoked by uid 500); 3 May 2010 17:01:04 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 98429 invoked by uid 500); 3 May 2010 17:01:04 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 98414 invoked by uid 99); 3 May 2010 17:01:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 17:01:04 +0000 X-ASF-Spam-Status: No, hits=-0.2 required=10.0 tests=AWL,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 69.147.107.21 is neither permitted nor denied by domain of phunt@apache.org) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 17:00:58 +0000 Received: from [10.73.135.252] (wifi-e-135-252.corp.yahoo.com [10.73.135.252]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o43GxlJB067582; Mon, 3 May 2010 09:59:47 -0700 (PDT) Message-ID: <4BDF0102.2020401@apache.org> Date: Mon, 03 May 2010 09:59:46 -0700 From: Patrick Hunt User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org, "zookeeper-dev@hadoop.apache.org" CC: Dave Wright Subject: Re: Dynamic adding/removing ZK servers on client References: <4BDEFB1A.5010107@apache.org> In-Reply-To: <4BDEFB1A.5010107@apache.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Another benefit of ZOOKEEPER-146 - we could use this for some sort of load balancing amongst the ensemble members. The first version could return a static list, however I can see where the HTTPD might be updated to monitor the load on the servers/ensemble and prioritize the list for each client request... Patrick On 05/03/2010 09:34 AM, Patrick Hunt wrote: > > On 05/03/2010 07:03 AM, Dave Wright wrote: >> I've got a situation where I essentially need dynamic cluster >> membership, which has been talked about in ZOOKEEPER-107 but doesn't >> look like it's going to happen any time soon. >> > > Could you provide some insight into why you need this? Just so we have > addl background, I'm interested to know the use case. > >> For now, I'm planning on working around this by having a simple >> coordinator service on the server nodes that will re-write the configs >> and bounce the servers when membership changes. Clients will may get >> an error or two and need to reconnect, but that should be handled by >> the normal error logic. >> > > Are you expecting all of the servers to change each time, or just > incremental changes (add/remove a single server, vs say move the entire > cluster from 3 hosts a/b/c to x/y/z) > >> On the client side, I'd really like to dynamically update the server >> list w/o having to re-create the entire Zookeeper object. Looking at >> the code, it seems like it would be pretty trivial to add >> "RemoveServer()/AddServer()" functions for Zookeeper that calls down >> to ClientCnxn, where they are just maintained in a list. Of course if >> the server being removed is the one currently connected, we'd need to >> disconnect, but a simple call to disconnect() seems like it would >> resolve that and trigger the automatic re-connection logic. >> > > You would hook this (add/remove) into JMX? That seems like a good option > to provide. > > Any chance you could use DNS for this? ie change the mapping for the > hostname from a -> x ip? Since the server a will go down anyway, this > would cause the client to reconnect to b/c (eventually when dns ttl > expires the client would also potentially connect to x). > > If this is an option be sure to see (a bit of work to do): > https://issues.apache.org/jira/browse/ZOOKEEPER-328 > https://issues.apache.org/jira/browse/ZOOKEEPER-338 > > You might also look at this patch, we never committed it but it might be > interesting to you: > https://issues.apache.org/jira/browse/ZOOKEEPER-146 > > The benefit is that you'd only have one place to make the change, esp > given that clients might be down/unreachable when this change occurs. > Clients would have to poll this service whenever they get disconnected > from the ensemble. One drawback of this approach is that the HTTP now > becomes a potential SPOF. (although I guess you could always fall back > to something, or potentially have a list of HTTP hosts to do the lookup, > etc...). > >> Does anyone see an issue with that approach? >> Were I to create the patch, do you think it would be interesting >> enough to merge? It seems like that functionality will eventually be >> needed for whatever full dynamic server support is eventually >> implemented. > > It does sound interesting, however once we add something like this it's > hard to change given that we try very hard to maintain b/w > compatibility. If you did the testing and were able to verify I don't > see why we couldn't add it - as it's "optional" in the sense that it > would only be called in the use case you describe. I would feel more > confident if we had more concrete detail on how we intend to do 107 (a > basic functional/design doc that at least reviews all the issues), and > how this would fit in. But I don't see that should necessarily be a > blocker (although others might feel differently). > > (fyi it's good to discuss this sort of thing on zookeeper-dev, please > move responses to that list) > > Sounds like an useful project, I'm interested to her what others think > about it. Regards, > > Patrick