hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10115) Exclude duplicate jars in hadoop package under different component's lib
Date Mon, 09 Mar 2015 18:19:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353321#comment-14353321
] 

Allen Wittenauer commented on HADOOP-10115:
-------------------------------------------

bq. One possible gap caused by just skipping the jars (rather than symlinking) is that if
folks rely on the directory layout at deployment time to grab needed jars they might miss
out. Presumably they're already grabbing the common share dir though?

If you symlink, is there actually any benefit? It shrinks the distribution size, sure, but
I suspect the JVM won't resolve the link to a degree that it realizes it is the same jar.
 Also, given that, e.g., HDFS requires common, if folks are only grabbing the HDFS deps and
not the common deps, they are doing Bad Things (tm). But if we only commit this to trunk,
it's even less of a concern. ;)

bq. One good reason to do it as a follow-on is that we could switch to using an maven assembly
instead of a shell script.

I'm inclined to commit this now and fix this up either as a maven assembly or a separate script
as a separate JIRA under the guiding principle of "don't let best stop better."  I don't think
there is any real question of whether or not this is better than what is currently there.
 Best might end up being more subjective and take longer.

bq. (the two code comments)

Yes, probably a good idea.

bq. Should the yarn get processed before the NFS projects?

I'm not sure if it matters much.


> Exclude duplicate jars in hadoop package under different component's lib
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-10115
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10115
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>              Labels: common, hdfs, mapreduce, nfs, yarn
>         Attachments: HADOOP-10115-004.patch, HADOOP-10115-005.patch, HADOOP-10115-006.patch,
HADOOP-10115.patch, HADOOP-10115.patch, HADOOP-10115.patch
>
>
> In the hadoop package distribution there are more than 90% of the jars are duplicated
in multiple places.
> For Ex:
> almost all jars in share/hadoop/hdfs/lib are already there in share/hadoop/common/lib
> Same case for all other lib in share directory.
> Anyway for all the daemon processes all directories are added to classpath.
> So to reduce the package distribution size and the classpath overhead, remove the duplicate
jars from the distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message