hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vasu Mariyala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8475) TableMapReduceUtils#addDependencyJars(Job) shouldn't ship duplicate entries in tmpjars
Date Mon, 06 May 2013 23:22:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650221#comment-13650221

Vasu Mariyala commented on HBASE-8475:

addDependencyJars function uses Set<String> to hold these tmpjars. As Set cannot have
duplicates, it would probably not ship the same jars multiple times. 
> TableMapReduceUtils#addDependencyJars(Job) shouldn't ship duplicate entries in tmpjars
> --------------------------------------------------------------------------------------
>                 Key: HBASE-8475
>                 URL: https://issues.apache.org/jira/browse/HBASE-8475
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Nick Dimiduk
> *note* this is a hypothesis that needs further investigation.
> Users running the above method in their jobs as a convenience for  packaging up dependencies
will see duplicate entries for jars providing common functionality, ie hbase.jar, hadoop.jar,
user application jar. For instance, if they specify a custom output format and custom partitioner,
both in the same jar, that jar is resolved via classloader and added to tmpjars twice. This
imposes a fair amount of pre-job IO (particularly if you jars are fat with packed dependencies)
and delay job launch accordingly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message