hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7359) Pluggable interface for cluster membership
Date Thu, 09 Jun 2011 17:58:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046708#comment-13046708

Aaron T. Myers commented on HADOOP-7359:

bq. Would anyone object to allowing the HostsReader to trigger refreshNodes? That would let
Hadoop scan for or be notified of cluster membership changes and automagically do the Right

In the abstract I think this is a fine change to make.

bq. Introduce a "Refreshable" interface that both FSNamesystem and JobTracker implement, that
only defines a refreshNodes method. HostsReader would have an initialize method that takes
a Refreshable and users could choose to call refreshNodes.

I think the name "Refreshable" isn't the best. Seems a little too generic to me. How about
something like "NodeListRefreshable" ?

Also, the NN and the JT already implement the interfaces {{o.a.h.hdfs.protocol.ClientProtocol}}
and {{o.a.h.mapred.AdminOperationsProtocol}}, respectively, both of which require implementation
of a {{refreshNodes()}} method which happen to have the same signature. You could just make
these interfaces extend your new interface and then you'd get the genericity you'd need without
actually having to touch the NN or JT classes at all.

bq. The current file-based cluster membership would continue to work exactly as it does today.

That seems wise to me. This proposed change would also make it easy to potentially make the
{{HostsFileReader}} do something like periodically check the mtime of the hosts files and
re-read them automatically if they've changed and call {{refreshNodes()}} on the relevant

> Pluggable interface for cluster membership
> ------------------------------------------
>                 Key: HADOOP-7359
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7359
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Travis Crawford
>         Attachments: HADOOP-7359.diff
> Currently Hadoop uses local files to determine cluster membership. With HDFS for example,
dfs.hosts and dfs.hosts.exclude are used.
> To enable tighter integrations cluster membership should be an interface, with the current
file-based functionality provided as the default implementation. The common case would be
no functional change, however, sites could plug an alternative implementation in, such as
pulling the machine lists from a machine database.
> Two machine lists, includes and excludes, are used to define cluster membership and state.
HostsFileReader currently handles reading these lists from files, who's names are passed in
by FSNamesystem for HDFS and JobTracker for MR.
> The proposed change is adding a HostsReader interface to common, and changing HostsFileReader
to an abstract class that functions the same as today.
> Two new classes, DFSHostsFileReader and MRHostsFileReader, extend HostsFileReader and
simply pass the appropriate file names in. These new classes are needed because config key
names live outside common.
> Two new conf keys, defaulting to the file-based readers, would be added to choose a different
hosts reader: dfs.namenode.hosts.reader.class mapreduce.jobtracker.hosts.reader.class
> Comments/suggestions? I have most of this written already but would love some feedback
on the general idea before posting the diff.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message