hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2866) JobConf should validate key names in well-defined namespaces and warn on misspelling
Date Thu, 21 Feb 2008 08:30:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570982#action_12570982

Aaron Kimball commented on HADOOP-2866:


You're definitely correct.

But in general, there are several problems with the JobConf system from a software engineering
point of view:

1) Naming conventions don't exist. foo.bar.camelBaz, foo.bar.noncamelbaz, and foo.bar.dots.between.each.word
are all used
2) The hierarchy imposed by the keys in the JobConfs have nothing to do with which modules
actually use them. Two isolated modules can both depend on the same key for arbitrarily different
functionality, tying one another together -- and no system exists to prevent this.
3) The hierarchy is arbitrarily ignored: why does "map.input.file" exist, when there is already
an established "mapred.map" hierarchy? What is the difference between "hadoop.job".\* and
"job.\*" ? Shouldn't everything in the entire system technically be hadoop.\* ?
4) Most config options are hardcoded throughout the source as raw strings; they are not placed
in public static final Strings at the head of the dependent class, nor are they "registered"
in any way with JobConf.

I think that a major refactoring of JobConf & friends is probably necessary to address
all these issues.  Furthermore, coding standards need to address formatting and hierarchy
of config strings and approach this from the human side. 

So for starters we can:
1) Add this mechanism, which at the very least will catch typos in user configurations
2) Encourage people who commit user patches to develop and enforce guidelines for naming conventions

3) Encourage people who commit user patches to require that patches update the JobConfValidator
if they deprecate key names. 

And longer-term, I may file another JIRA to address the rest of this.

> JobConf should validate key names in well-defined namespaces and warn on misspelling
> ------------------------------------------------------------------------------------
>                 Key: HADOOP-2866
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2866
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Aaron Kimball
>            Priority: Minor
>             Fix For: 0.16.1, 0.17.0
>   Original Estimate: 72h
>  Remaining Estimate: 72h
> A discussion on the mailing list reveals that some configuration strings in the JobConf
are deprecated over time and new configuration names replace them:
> e.g., "mapred.output.compression.type" is now replaced with "mapred.map.output.compression.type"
> Programmers who have been manually specifying the former string, however, receive no
diagnostic output during testing to suggest that their compression type is being silently
> It would be desirable to notify developers of this change by printing a warning message
when deprecated configuration names are used in a newer version of Hadoop. More generally,
when any configuration string in the mapred.\*, fs.\*, dfs.\*, etc namespaces are provided
by a user and are not recognized by Hadoop, it is desirable to print a warning, to indicate
malformed configurations. No warnings should be printed when configuration keys are in user-defined
namespaces (e.g., "myprogram.mytask.myvalue").

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message