hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Kubes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
Date Tue, 24 Jul 2007 21:25:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515094

Dennis Kubes commented on HADOOP-1622:

I got to thinking, always a dangerous thing, and I thought if we are extending this for multiple
jar file, why not other resources like jars on the classpath, jars that contain a given class,
and directories.  Let's say that we could specify one or more directories as a resource to
be included in the job jar, then when we do the merge we would copy all resources from that
directory into the job jar.  This would allow us to do thing like deploy executables, resource
files, or multiple jar files across the cluster to be used in the jobs.  So say you have a
custom executable you need to call in your MR job, you just drop it in a directory, include
the directory as a job resource and that executable would get deployed out onto the cluster
and would be available for that single job.

I went back and refactored the code to allow job resources as opposed to just jar files. 
A resource would be either an absolute path to a jar file, a jar file on the classpath, a
directory, or the name of a class that is contained in a jar on the classpath.  As an added
bonus getJars and addJar now become getJobResources and addJobResource (we may need to come
up with a different name as this might be too easily confused with default and final resouces
in configuration), and we can keep getJar and setJar as they now apply only to the final job
jar file.

I am doing final testing of this code right now and will have a patch up in just a little

> Hadoop should provide a way to allow the user to specify jar file(s) the user job depends
> --------------------------------------------------------------------------------------------
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Attachments: multipleJobJars.patch
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the user to
specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar or put
the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, if the user
does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the user has to
re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user to specify
a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message