Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 81517 invoked from network); 7 Dec 2009 19:24:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Dec 2009 19:24:56 -0000 Received: (qmail 38761 invoked by uid 500); 7 Dec 2009 19:24:55 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 38692 invoked by uid 500); 7 Dec 2009 19:24:55 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 38682 invoked by uid 99); 7 Dec 2009 19:24:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 19:24:55 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 69.147.107.20 is neither permitted nor denied by domain of phunt@apache.org) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 19:24:53 +0000 Received: from [10.72.168.31] (snvvpn4-10-72-168-c31.hq.corp.yahoo.com [10.72.168.31]) by mrout1-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id nB7JNtCA089262; Mon, 7 Dec 2009 11:23:55 -0800 (PST) Message-ID: <4B1D564A.2090000@apache.org> Date: Mon, 07 Dec 2009 11:23:54 -0800 From: Patrick Hunt User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org, richard@bengueladev.com Subject: Re: Zookeeper hostname lookup References: <1260177255.30509.28.camel@pyro.bengueladev.com> In-Reply-To: <1260177255.30509.28.camel@pyro.bengueladev.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Richard Dorman wrote: > I'm trying to startup a quorum of Zookeeper servers in a cluster, > however, Zookeeper is failing to start because it cannot find its > hostname in the list of Zookeeper quorum servers. Can you provide the contents of the ZK log for this? The thing is, we don't do as you say "lookup our hostname in the list of zk quorum servers", rather we rely on the "myid" file, which resides in the data directory (you should have created when you setup the cluster) to identify who "we" (meaning the server) are during server start. So: 1) myid file has the server id 2) config file on each server has something like server.1=host1:2888:2889 server.2=host2:2988:2989 where host1 will have myid file with "1" host2 will have myid file with "2" > I know this problem is well documented on the WIKI, however, my > situation is a little different. The allocation of a node to become a > Zookeeper is done dynamically by a management service running else where > on the cluster. This node then associates its IP with a hostname in the > Zookeeper quorum list. The hostname is not the default hostname of the > node. The node may associate its IP with multiple hostnames for each > service that it is allocated. We register a server socket as follows: ss = new ServerSocket(self.getQuorumAddress().getPort()); Note: we only specify the port number, not the host name/addr here. This should mean that the socket will register on all interfaces (on the host) for all possible ip addresses (wildcard match). > This causes a problem when Zookeeper starts. Zookeeper does a > getdefaulthost which will return the nodes default hostname and not the > associated hostname. As I mentioned I'd like to see the log for this error. > So my questions are: > > 1. Is it possible to resolve this some other way? We are not running DNS > (hostname associations are managed by our own services). We also cannot > use the nodes ip address as the nodes are allocated dynamically. > Dynamically updating the config files is also not practical. > > 2. Why does Zookeeper need to test whether its hostname is in the > Zookeeper quorim list? Can this safely be disabled? AFAIC we are not doing this. If you could send your config file as well it would be interesting to see in addition to the log of the error. This is EC2 or something else? What version of ZK are you running? Patrick