hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Busbey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-11680) Deduplicate jars in convenience binary distribution
Date Thu, 05 Mar 2015 19:57:38 GMT
Sean Busbey created HADOOP-11680:

             Summary: Deduplicate jars in convenience binary distribution
                 Key: HADOOP-11680
                 URL: https://issues.apache.org/jira/browse/HADOOP-11680
             Project: Hadoop Common
          Issue Type: Improvement
          Components: build
            Reporter: Sean Busbey
            Assignee: Sean Busbey

Pulled from discussion on HADOOP-11656 Colin wrote:

bq. Andrew wrote: One additional note related to this, we can spend a lot of time right now
distributing 100s of MBs of jar dependencies when launching a YARN job. Maybe this is ameliorated
by the new shared distributed cache, but I've heard this come up quite a bit as a complaint.
If we could meaningfully slim down our client, it could lead to a nice win.

I'm frustrated that nobody responded to my earlier suggestion that we de-duplicate jars. This
would drastically reduce the size of our install, and without rearchitecting anything.
In fact I was so frustrated that I decided to write a program to do it myself and measure
the delta. Here it is:

du -h /h
249M    /h
du -h /h
140M    /h

Seems like deduplicating jars would be a much better project than splitting into a client
jar, if we really cared about this.

This message was sent by Atlassian JIRA

View raw message