hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
Date Tue, 25 Feb 2014 22:17:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912144#comment-13912144
] 

Karthik Kambatla commented on YARN-1492:
----------------------------------------

Thanks for sharing this, [~ctrezzo]. The document is nicely written. Few comments:
* Would SCM be a single point of failure? If yes, would anyone of the following approaches
make sense.
** Make SCM an AM. From YARN-896, the only sub-task that affects this would be the delegation
tokens. 
** Add an SCMMonitorService to the RM. If SCM is enabled, this service would start the SCM
on one of the nodes and monitor it. 
* SCM Cleaner Service - the doc mentions the tension between frequency of cleaner and load
on the RM. Can you elaborate? I was of the opinion that the RM is not involved in the caching
at all. 
* Cleaner protocol doesn't mention when the cleaner lock is cleared. I assume it is cleared
on each exit path. 
* Nit: ZK-based store - we can may be do this in the JIRA corresponding to the sub-task -
how would this look like? 
* More nit-picking: The rationale for not using in-memory and reconstructing seems to come
from long-running applications. Given long-running applications don't benefit from the shared
cache as much as the shorter ones, is this a huge concern? 

> truly shared cache for jars (jobjar/libjar)
> -------------------------------------------
>
>                 Key: YARN-1492
>                 URL: https://issues.apache.org/jira/browse/YARN-1492
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.0.4-alpha
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf,
shared_cache_design_v4.pdf, shared_cache_design_v5.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and files so
that attempts from the same job can reuse them. However, sharing is limited with the distributed
cache because it is normally on a per-job basis. On a large cluster, sometimes copying of
jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth,
not to speak of defeating the purpose of "bringing compute to where data is". This is wasteful
because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared cache so that
multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message