hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
Date Thu, 20 Mar 2008 23:31:27 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mahadev konar updated HADOOP-1622:
----------------------------------

    Attachment: HADOOP-1622_1.patch

attaching a patch for this feature. It does not have unit tests included. I am still writing
unit tests and will upload a patch by the end of the day. 

this patch enhances the hadoop command line for job submission:

so you can say:

- bin/hadoop jar -files <commaseperated files> -libjars <comma seperated libs>
-archives <comma seperated archives>

- these options are all optional and the command line is backwards compatible

- the patch uses cli for command line parsing

- it uses DistributedCache for copying files locally to the tasks 

- it supports uri's in the command line arguments

- if the files are already uploaded do the hdfs used by jobtracker then it does not recopy
the files -- there is a tiny catch here ... since the uri's are matched as string for the
remote file system and the one jt uses, it might be possible that the files are copied even
though its the same dfs (ex: hdfs://hostname1:port != hdfs://hostname1.fullyqualifiedname:port)


- the command line files, archives, libajrs are stored temporarurly in the hdfs job directory
from where they are copied locally.


> Hadoop should provide a way to allow the user to specify jar file(s) the user job depends
on
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Mahadev konar
>             Fix For: 0.17.0
>
>         Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch,
HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, multipleJobJars.patch,
multipleJobResources.patch, multipleJobResources2.patch
>
>
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the user to
specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar or put
the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, if the user
does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the user has to
re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user to specify
a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message