hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-785) Divide the server and client configurations
Date Tue, 14 Aug 2007 09:57:30 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519638

Arun C Murthy commented on HADOOP-785:

Ok, how is this for an about turn...

I had a long, soul-crushing, discussion with Doug last night about the config rejig where
he basically blew away my above proposal to smithereens while I lamely noded. *smile*

Here is the crux of the Doug's arguments:

Essentially we need 3 config files:
a) Read-only defaults (existing hadoop-defaults.xml).
b) A file where the admin specifies config values which *can* be overridden (existing mapred-defaults.xml).
c) A file where the admin specifies a set of hard, sane limits for some config values which
*cannot* be overridden (existing hadoop-site.xml).

Clearly we have issues when users/admins specify configs in the wrong place e.g. set {{mapred.speculative.execution}}
in hadoop-site.xml, thereby robbing users of the opportunity to override it and so on, and
those are just that: mistakes while configuring hadoop.

That being said, clearly we have a documentation and worse, a naming issue. It is hardly apparent
to the users that *mapred-defaults.xml* is a generic, overridable config file and clearly
not their fault that it is hardly used.

Overall there isn't any *missing functionality*, rather a lack of clarity and understanding;
primarily a nomencalture/documentation issue.

Hence, here is a much simpler way to go about this:
a) Keep hadoop-defaults.xml as the read-only default config file.
b) Rename hadoop-site.xml and mapred-defaults.xml to better reflect what they are: non-overridable
& overridable site-specific configs. Some options are:
  i) hadoop-initial.xml (overridable) and hadoop-final.xml (non-overridable)
  ii) hadoop-site-defaults.xml (overridable) and hadoop-site-limits.xml (non-overridable)

I strongly feel we *do* need to rename existing config files just to get the message across...

Clearly existing Configuration and JobConf classes handle these quite well and hence there
is hardly any reason to change them. OTOH we *really* need to shout from the rooftops w.r.t
the various config files and their roles and uses (i.e. better documentation).

Overall, less change the better. So much for my earlier proposal... sigh! *smile*
(I know Owen has some *thoughts* on the same... watch this space!.)



> Divide the server and client configurations
> -------------------------------------------
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
> The configuration system is easy to misconfigure and I think we need to strongly divide
the server from client configs. 
> An example of the problem was a configuration where the task tracker has a hadoop-site.xml
that set mapred.reduce.tasks to 1. Therefore, the job tracker had the right number of reduces,
but the map task thought there was a single reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which overrides both
client and server configs. Furthermore, the properties from the *-default.xml files should
never be saved into the job.xml.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message