hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3722) Provide a unified way to pass jobconf options from bin/hadoop
Date Tue, 21 Oct 2008 22:54:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Chansler updated HADOOP-3722:
------------------------------------

    Release Note: Changed streaming StreamJob and Submitter to implement Tool and Configurable,
and to use GenericOptionsParser arguments  -fs, -jt, -conf, -D, -libjars, -files, and -archives.
Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec,  from streaming
and pipes in favor of the generic options. Removed from streaming  -config, -mapred.job.tracker,
and -cluster.  (was: This issue 
1. changed StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable.
Streaming and submitter now accepts GenericOptionsParser arguments :
  -fs, -jt, -conf, -D, -libjars, -files, -archives

2. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec,  from streaming
and pipes(where applicable) in favor of the generic options. The options still work issuing
a warning as a side effect, however they may be later removed in the following releases. 


3. removed from streaming :
 -config : since it is not documented anywhere
 -mapred.job.tracker : it sets the wrong property, so it not used currently. 
 -cluster : because setting -cluster gives "Unexpected -cluster while processing" error, so
it is not used currently. 
)
    Hadoop Flags: [Incompatible change, Reviewed]  (was: [Reviewed, Incompatible change])

This issue 
1. changed StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable.
Streaming and submitter now accepts GenericOptionsParser arguments :
  -fs, -jt, -conf, -D, -libjars, -files, -archives

2. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec,  from streaming
and pipes(where applicable) in favor of the generic options. The options still work issuing
a warning as a side effect, however they may be later removed in the following releases. 


3. removed from streaming :
 -config : since it is not documented anywhere
 -mapred.job.tracker : it sets the wrong property, so it not used currently. 
 -cluster : because setting -cluster gives "Unexpected -cluster while processing" error, so
it is not used currently. 


> Provide a unified way to pass jobconf options from bin/hadoop
> -------------------------------------------------------------
>
>                 Key: HADOOP-3722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3722
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Enis Soztutar
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3722.patch, jobconfoptions_v1.patch, jobconfoptions_v2.patch
>
>
> Often when running a job it is useful to override some jobconf parameters from jobconf.xml
for that particular job - for example, setting the job priority, setting the number of reduce
tasks, setting the HDFS replication level, etc. Currently the Hadoop examples, streaming,
pipes, etc take these extra jobconf parameters in different was: the examples in hadoop-examples.jar
use -Dkey=value, streaming uses -jobconf key=value, and pipes uses -jobconf key1=value1,key2=value2,etc.
Things would be simpler if bin/hadoop could take the jobconf parameters itself, so that you
could run for example bin/hadoop -Dkey=value jar [whatever] as well as bin/hadoop -Dkey=value
pipes [whatever]. This is especially useful when an organization needs to require users to
use a particular property, e.g. the name of a queue to use for scheduling in HADOOP-3445.
Otherwise, users may confuse one way of passing parameters with another and may not notice
that they forgot to include certain properties.
> I propose adding support in bin/hadoop for jobconf options to be specified with -C key=value.
This would have the effect of setting hadoop.jobconf.key=value in Java's system properties.
The Configuration class would then be modified to read any system properties that begin with
hadoop.jobconf and override the values in hadoop-site.xml.
> I can write a patch for this pretty quickly if the design is sound. If there's a better
way of specifying jobconf parameters uniformly across Hadoop commands, let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message