hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
Date Wed, 13 May 2015 00:39:01 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541107#comment-14541107

Karthik Kambatla commented on MAPREDUCE-5951:

Thanks Chris.

Comments on the latest patch:
# DistributedCache changes are all spurious. Omit it from the diff. 
# Remove Job#addArchiveToSharedCacheAndClasspath altogether? 
# JobImpl#cleanupSharedCacheResources should be always called when the job finishes, irrespective
of success.
# JobResourceUploader
## stopSharedCache should set scClient to null
## We don't need a separate boolean to check if SCM is available. We should be able to just
use (sClient != null). If we run into any issues talking to SCM, we should just abort and
call stopSharedCache to avoid using it for the rest of the dependencies.
## uploadFiles seems to be trying to handle the case where shared-cache goes down after we
set all the files that need to be uploaded. We should leave this check to the NM. The SCM
might come back up by the time the NM tries uploading files. Or, it could be available at
this point and go down later. 
      // if scm fails in the middle, we will set shared cache upload policies
      // for all resources
      // to be false. The resources that are shared successfully via
      // SharedCacheClient.use will
      // continued to be shared.
      if (scClient != null && !isScmAvailable()) {
## uploadFiles checks if shared-cache is available before setting the upload policies. I think
we should set the upload policies irrespective of SCM's availability when shared-cache is
not disabled. Also, we should modify the below code to have the second if inside the first
          if (isSharedCacheFilesEnabled()) {
            newPath = useSharedCache(tmp, conf);

          // need to inform NM to upload the file to shared cache.
          if (newPath == null && isSharedCacheFilesEnabled()) {
            filesSCUploadPolicies[indexOfFilesSCUploadPolicies] = true;
## useSharedCache should check for (scClient != null)
## uploadFiles has a lot of duplication. Can we file a follow-up JIRA to simplify it? 
## Also, due to the try-finally in uploadFiles, it is kind of hard to see what lines have
been changed. Do you think it makes any sense to wrap this in another method call, so the
indentations don't show spurious changes? I am assuming most of this code will be touched
by the follow-up JIRA.
## useSharedCache javadoc issue
# MRApps - we are using "*" instead of "job.jar" which should work, but I wonder if that will
be an incompatible behavior change. [~jlowe], [~vinodkv] - what do you think? 
# YarnRunner - indentation issue on line 360

> Add support for the YARN Shared Cache
> -------------------------------------
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v2.patch,
MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch,
MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch,
> Implement the necessary changes so that the MapReduce application can leverage the new
YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set
of resources they would like to cache (i.e. jobjar, libjars, archives, files).

This message was sent by Atlassian JIRA

View raw message