hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-785) Divide the server and client configurations
Date Wed, 15 Aug 2007 20:43:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520088

Sameer Paranjpye commented on HADOOP-785:

It does make sense to have overrideable values on the server too, e.g., to determine the default
block size for client programs which don't override it. Under Arun's proposal this would be
in hadoop-initial.xml on the servers. Where would it be in your proposal? As items in hadoop-server.xml
that are not named in hadoop.client.override? Is this really less confusing?

The default block size for client programs would be in _hadoop-client.xml_, settings in this
file would override those in _hadoop-defaults.xml_. 

Another issue with your proposal is that it requires different Configuration construction
code on clients and servers. Do we always know, everywhere that a Configuration is created,
whether we are running as a client or a server?

I proposed the client-server nomenclature because I feel it makes the system more comprehensible.
 Admittedly, the distinction between clients and servers isn't always clear, but the proposed
filenames are intended to map elements of configuration to system components and the people
that configure them. The file _hadoop-client.xml_ is supplied by "users" - people that run
map/reduce jobs and is read by "clients" i.e. jobs, tasks and the shell. The file _hadoop-server.xml_
is supplied by "admins" - people that keep Hadoop clusters up and running and is read by servers.
Depending on the context either _hadoop-client.xml_ or _hadoop-server.xml_ would be the "final
resource" read by a Configuration object. There is no technical reason for these files to
be named differently, indeed currently they are not, _hadoop-site.xml_ is the final resource
read by both clients and servers. We could even have 3 files, _hadoop-client.xml_, _hadoop-mapred.xml_
and _hadoop-dfs.xml_ read by clients, map/reduce servers and HDFS servers respectively. It
would require some differences in Configuration construction code, but these don't appear
to be too convoluted. The name of the final resource consumed could be set by clients and
servers upon start-up and then used by all Configuration objects constructed by the servers.
The final resource could also be overridden by values supplied on the command line.


> Divide the server and client configurations
> -------------------------------------------
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
> The configuration system is easy to misconfigure and I think we need to strongly divide
the server from client configs. 
> An example of the problem was a configuration where the task tracker has a hadoop-site.xml
that set mapred.reduce.tasks to 1. Therefore, the job tracker had the right number of reduces,
but the map task thought there was a single reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which overrides both
client and server configs. Furthermore, the properties from the *-default.xml files should
never be saved into the job.xml.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message