hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6719) The list of -libjars archives should be replaced with a wildcard in the distributed cache to reduce the application footprint in the state store
Date Tue, 21 Jun 2016 18:19:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342378#comment-15342378
] 

Sangjin Lee commented on MAPREDUCE-6719:
----------------------------------------

Thanks Daniel! Will commit it shortly.

> The list of -libjars archives should be replaced with a wildcard in the distributed cache
to reduce the application footprint in the state store
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6719
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6719
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 2.8.0
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: MAPREDUCE-6719.001.patch, MAPREDUCE-6719.002.patch
>
>
> When using the -libjars option to add classes to the classpath, every library so added
is explicitly listed in the ContainerLaunchContext's local resources even though they're all
uploaded to the same directory in HDFS. When using tools like Crunch without an uber JAR or
when trying to take advantage of the shared cache, the number of libraries can be quite large.
We've seen many cases where we had to turn down the max number of applications to prevent
ZK from running out of heap because of the size of the state store entries.
> This JIRA proposes to allow for wildcards both in the internal processing of the -libjars
switch and in paths added through the Job and DistributedCache classes. Rather than listing
all files independently, this JIRA proposes to replace the complete list of libdir files with
the wildcarded libdir directory, e.g. "libdir/*". This behavior is the same as the current
behavior when using -libjars, but avoids explicitly listing every file.
> This capability will also be exposed by the {{DistributedCache.addCacheFile()}} method.
> See YARN-4958 for the NM side of the implementation and additional discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message