hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
Date Thu, 27 Apr 2017 20:22:04 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987585#comment-15987585

Erik Krogen commented on MAPREDUCE-5951:

Hey [~ctrezzo], I have a question about the behavior of this patch. Currently the old logic
for resource visibility is used, so if a resource is world-readable, it will be marked as
PUBLIC, else PRIVATE. Given my current understanding of this patch's behavior, I see the following
* Client submits a job with libjar X, which has never been used before. Client contacts SCM
to mark X as "used", SCM responds that it does not have X.
* Client uploads X to staging directory, which I assume here is _not_ world-readable. X is
marked as PRIVATE.
* MR-AM localizes X, then uploads it to the shared cache. Other NMs all localize X as PRIVATE
and do not share it with other applications.
* Client then submits the same job with the same X. Client contacts SCM, and SCM responds
with a world-readable (755 dirs / 555 file) path inside of the shared cache.
* Client does not upload X, and marks X as PUBLIC, since it is currently in a world-readable
* MR-AM and NMs all localize X as PUBLIC and share it with other applications.
Please correct me if I am wrong on any of these steps. It seems that it is the expected behavior
that X is eventually PUBLIC, given that we asked for it to be uploaded to the publicly shared
cache, but it seems unnecessary for it to be marked as PRIVATE the first time around. Do we
do this just to avoid changing the existing logic for marking a resource as PRIVATE vs PUBLIC,
is this an oversight, or is this behavior desired?

> Add support for the YARN Shared Cache
> -------------------------------------
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-Overview.001.pdf, MAPREDUCE-5951-trunk.016.patch,
MAPREDUCE-5951-trunk.017.patch, MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch,
MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, MAPREDUCE-5951-trunk-v12.patch,
MAPREDUCE-5951-trunk-v13.patch, MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch,
MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch,
MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, MAPREDUCE-5951-trunk-v9.patch
> Implement the necessary changes so that the MapReduce application can leverage the new
YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set
of resources they would like to cache (i.e. jobjar, libjars, archives, files).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message