hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Kubes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
Date Sat, 27 Oct 2007 19:22:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538245
] 

Dennis Kubes commented on HADOOP-1622:
--------------------------------------

1. Could you please remove the mention of 'final' and 'default' config resources from the
javadoc for JobConf.{get|set}JobResources? They are no longer relevant vis-a-vis hadoop Configuration.

I have removed the mention of final and default resources.

2. Should we also have a JobConf.setJobResource along with JobConf.addJobResource, ala {{DistributedCache}
apis?

I had debated about set vs add resources.  The current behavior is when you add a resource
you are appending it to a list of resources as opposed to setting a resource which would clear
anything previously added and add only that resource.  Since many times jar resources are
added by including the jar file which contains a given class, I thought it better to NOT allow
clearing and resetting of job resources.

3. Should we move the private JobClient.createJobJar method to JarUtils to make it available
as a useful utility?

I debated about this too.  JarUtils was generic jaring and unjaring utilities.  But I don't
see harm in putting createJobJar in and I think you are right we may need that somewhere else
in the future.  I have remvoed from JobClient and added to JarUtils.

Unrelated: Does it make sense to rename Configuration.addResource to Configuration.addConfigResource?
I wonder how confusing these unrelated api names are, given JobConf is a Configuration to

Yeah, debated about this one too.  In the end we weren't just adding jars but multiple things
such as classes, exe, files.  Couldn't find a better name  for that then resource.  I put
it as jobResource to be a little less confusing.  Changing Configuration over to configResource
would be good I think, Although we should probably deprecate because a lot of things rely
on that method.

I am currently testing patch 9, will have it posted shortly.

> Hadoop should provide a way to allow the user to specify jar file(s) the user job depends
on
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>            Assignee: Dennis Kubes
>             Fix For: 0.16.0
>
>         Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch,
HADOOP-1622-7.patch, HADOOP-1622-8.patch, multipleJobJars.patch, multipleJobResources.patch,
multipleJobResources2.patch
>
>
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the user to
specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar or put
the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, if the user
does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the user has to
re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user to specify
a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message