hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
Date Fri, 13 Feb 2015 01:22:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319375#comment-14319375

Karthik Kambatla commented on MAPREDUCE-5951:

Sorry for the delay in getting to this. Getting a continuous chunk of time to look at this
somewhat large patch was hard. 

Here are my first round of comments - a combination of high-level and detailed comments. Let
us see if we can get some of this in through other JIRAs first, to allow for a more thorough
# DistributedCache changes aren’t central to what this JIRA is trying to address. Could
we leave them out and address in another JIRA? 
## This has nothing to do with this patch, but it would be nice to make the code around setting
CLASSPATH_FILES a little more readable. Could we define another String prefix to hold “”
or classpath, based on whether classpath is null. 
# Job
## The new APIs should all be @Unstable
## Let us make the javadoc for the new APIs a little more formal - we don’t need to mention
SCMClientProtocol.use, or that the APIs are intended for user use. Even for the return value,
I would go with something like “If shared cache is enabled and the resource added successfully,
return true. Otherwise, return false.”
## How about renaming the methods to addFileToSharedCache, addArchiveToSharedCache, addFileToSharedCacheAndClasspath?

## Make both new methods private static instead of static private.
# JobID changes might not be required. Use ConverterUtils#toApplicationId? 
# JobImpl
## cleanupSharedCacheResources - nit: I would check for (checksums == null || checksums.length
== 0) and return to save on indentations. 80 chars is already too small.
## cleanupSharedCacheUploadPolicies - javadoc should use block comments. Well, may be a nit.
# JobSubmitter
## Can we do the code moving from JobSumitter to FileUploader (may be, we need a more descriptive
name) to another JIRA and look at that first if needed. Otherwise, it is hard to review the
## May be, I am misreading the patch. Is this patch hardcoding MR job submission to always
use SharedCache? If yes, we should definitely avoid that. 
# mapred-default.xml: We need a little more fool-proof config. The way the patch currently
is, a typo will lead to unexpected behavior without any warnings.

> Add support for the YARN Shared Cache
> -------------------------------------
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch
> Implement the necessary changes so that the MapReduce application can leverage the new
YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set
of resources they would like to cache (i.e. jobjar, libjars, archives, files).

This message was sent by Atlassian JIRA

View raw message