hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8475) TableMapReduceUtils#addDependencyJars(Job) shouldn't ship duplicate entries in tmpjars
Date Mon, 06 May 2013 23:40:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650236#comment-13650236

Nick Dimiduk commented on HBASE-8475:

Nothing prevents one jar from being added multiple times with different filesystem paths (relative,
canonical, symlinks, etc). There's also the (uncommon) case of the JarFinder constructing
a jar on the fly from an exploded directory of class files. None of these cases are handled
> TableMapReduceUtils#addDependencyJars(Job) shouldn't ship duplicate entries in tmpjars
> --------------------------------------------------------------------------------------
>                 Key: HBASE-8475
>                 URL: https://issues.apache.org/jira/browse/HBASE-8475
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Nick Dimiduk
> *note* this is a hypothesis that needs further investigation.
> Users running the above method in their jobs as a convenience for  packaging up dependencies
will see duplicate entries for jars providing common functionality, ie hbase.jar, hadoop.jar,
user application jar. For instance, if they specify a custom output format and custom partitioner,
both in the same jar, that jar is resolved via classloader and added to tmpjars twice. This
imposes a fair amount of pre-job IO (particularly if you jars are fat with packed dependencies)
and delay job launch accordingly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message