hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Bertin (JIRA) <j...@apache.org>
Subject [jira] Commented: (HADOOP-127) Unclear precedence of config files and property definitions
Date Fri, 01 Sep 2006 14:44:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-127?page=comments#action_12432153 ] 
Frédéric Bertin commented on HADOOP-127:

<quote>Folks should only define things in the -site files if they want to force them
for all code. </quote>

I should have read this earlier, it would have saved me some time.

Actually, the fact that properties defined in hadoop-final.xml override EVERYTHING, included
properties defined in job config files, is something very important that should be well documented,
because it's not the intuitively expected behaviour (which, to me, was:
 - hadoop-default.xml, mapred-default.xml overrided by
 - hadoop-final.xml, overrided by
 - job config files

I've searched the wiki (afterwards, unfortunately) and it's very well documented there. However,
the comments included in hadoop-default.xml and other delivered config files are not clear
about this. Maybe they should be detailed, or just link to the wiki page.

> Unclear precedence of config files and property definitions
> -----------------------------------------------------------
>                 Key: HADOOP-127
>                 URL: http://issues.apache.org/jira/browse/HADOOP-127
>             Project: Hadoop
>          Issue Type: Bug
>          Components: conf
>         Environment: Hadoop 0.1.1, Nutch 0.8-dev
>            Reporter: Andrzej Bialecki 
> The order in which configuration resources are read is not sufficiently documented, and
also there are no mechanisms preventing harmful re-definition of certain properties, if they
are put in wrong config files.
> From reading the code in Hadoop Configuration.java, JobConf.java and Nutch NutchConfiguration.java
I _think_ this is what's happening.
> There are two groups of resources: default resources, loaded first, and final resources,
loaded at the end. All properties (re)-defined in files loaded later will override any previous
> * default resources: loaded in the order as they are added. The following files are added
here, in order:
>     1. hadoop-default.xml (Configuration)
>     2. nutch-default.xml  (NutchConfiguration)
>     3. mapred-default.xml (JobConf)
>     4. job_xx_xxx.xml       (JobConf, in JobConf(File config))
> * final resource: which always come after default resources, i.e. if any value is defined
here it will always override those set in default resources (NOTE: including per job settings!!!).
The following files are added here, in reversed order:
>     2. hadoop-site.xml (Configuration)
>     1. nutch-site.xml    (NutchConfiguration)
> (i.e. hadoop-site.xml will take precedence over anything else defined in any other config
> I would appreciate checking that this is indeed the case, and suggestions how to ensure
that you cannot so easily shoot yourself in the foot if you define wrong properties in hadoop-site
or nutch-site ...

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message