hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
Date Mon, 29 Oct 2007 22:26:52 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538619
] 

Doug Cutting commented on HADOOP-1622:
--------------------------------------

Owen & I talked a bit about this last week.  We determined three commonly useful types
of job resources:
- archives already present in the cluster that will be unpacked in the task dir
- archives already present in the cluster that will be intact in the task dir
- resources in the local filesystem that will be added to the task's classpath
This issue primarily concerns the last, but we should attempt to have a somewhat uniform mechanism.
 The primary differences between the first and the third are (a) that unqualified paths are
resolved relative to different filesystems; and (b) resources may or may not be visible on
the classpath.

All of these should be available from the command line, with -archive, -file and -jar respectively.

Owen, does that capture our discussion?  What would need to change in the current patch to
be consistent with that proposal?  Should we file another issue to improve command-line support
for these, or should this be done as a part of this issue?

> Hadoop should provide a way to allow the user to specify jar file(s) the user job depends
on
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>            Assignee: Dennis Kubes
>             Fix For: 0.16.0
>
>         Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch,
HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch,
multipleJobResources2.patch
>
>
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the user to
specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar or put
the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, if the user
does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the user has to
re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user to specify
a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message