hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: about zookeeper DNS
Date Wed, 06 Jul 2011 21:50:11 GMT
Hi Warren,

A varient of same idea has been prototyped, and proven to work.  In
https://issues.apache.org/jira/browse/HADOOP-7417, the proposed hadoop
deployment system is using mDNS to locate zookeeper location, and the
cluster topology is described as
/clusters/$cluster_name/$hostname/[$action_queue|$status_queue].  Each
agent use its own hostname to look up the path structure to resolve
what software installation and configuration procedure to work on.  It
is a great working model to coordinate large scale machines to perform
staged procedures.  From the prototype, we know that zookeeper and the
design works well and can scale to 10s of thousands of machines.


On Wed, Jul 6, 2011 at 1:46 PM, Warren Turkal <wt@penguintechs.org> wrote:
> Hi,
> I was actually the person who mentioned the idea the Zookeeper nameservice
> to Mr. Dunning at Hadoop Summit, which he mentioned in the "Hadoop Master
> and Slave Discovery" thread. My idea was a little more complex than just a
> DNS server. The DNS server would be useful for finding web frontends for
> various Hadoop services (JobTracker, TaskTracker, Namenode, etc.). However,
> the real magic would be creating a zookeeper specific naming service that
> could notify the listeners whenever an address:port changes. Keeping in mind
> that I am not very good with zookeeper yet, consider the following (with
> next gen mapreduce names):
>   - /zns/$cluster/resource_manager/elected_master - contains ip:address of
>   current resource manager
>   - /zns/$cluster/resource_manager/managers/* - each node is named after
>   one of the members of the resource master group of machines and contains
>   ip:address of the resource manager
>   - /zns/$cluster/node_managers/* - each node is named after one of the
>   node managers and contains the ip:address of the node manager
> Then one could use zookeeper instead of DNS to discover everything and be
> notified when it changes (thus avoiding the DNS TTL issue). For example,
> when a new node manager comes up, it could create and ephemeral node under
> the node_managers/ hierarchy. Then the master would be notified when that
> happens and the master could contact and configure the machine. All that the
> resource and node managers would have to know is where the root of the
> zookeeper node hierarchy is.
> Also, there could really nice way to access web services identified by those
> nodes. There could be DNS server that is authoritative for names in the
> $cluster.zns.example domain. It could answer with SRV or A or AAAA records
> for names like elected_master.resource_manager.$cluster.zns.examplebe. In
> the case of a SRV record, you can include a address and port. In the case of
> A or AAAA, you could respond with an address for a web proxy that serves up
> the appropriate ip:port or a web redirector that redirects to http://ip:port
> /$query_str.
> I could also imagine having /zns/$cluster/jobs/$username/$jobname/$taskid
> (and $taskid.$jobname.$username.jobs.$cluster.zns.example via DNS/web proxy)
> link to a specific task in a job. The /zns/$cluster/jobs/$username/$jobname
> node could contain a list of all tasks for a particular job.
> The /zns/$cluster/jobs/$username node could contain a list of all jobs
> running under a specific user. /zns/$cluster/jobs could contain a list of
> all jobs managed by the resource master.
> I'm sure not all of what I have said is sound design, but I am hoping it
> conveys my message. Also, I think something like this could be really cool.
> Thanks,
> wt

View raw message