hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
Date Fri, 13 Feb 2015 01:22:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319375#comment-14319375
] 

Karthik Kambatla commented on MAPREDUCE-5951:
---------------------------------------------

Sorry for the delay in getting to this. Getting a continuous chunk of time to look at this
somewhat large patch was hard. 

Here are my first round of comments - a combination of high-level and detailed comments. Let
us see if we can get some of this in through other JIRAs first, to allow for a more thorough
review.
# DistributedCache changes aren’t central to what this JIRA is trying to address. Could
we leave them out and address in another JIRA? 
## This has nothing to do with this patch, but it would be nice to make the code around setting
CLASSPATH_FILES a little more readable. Could we define another String prefix to hold “”
or classpath, based on whether classpath is null. 
# Job
## The new APIs should all be @Unstable
## Let us make the javadoc for the new APIs a little more formal - we don’t need to mention
SCMClientProtocol.use, or that the APIs are intended for user use. Even for the return value,
I would go with something like “If shared cache is enabled and the resource added successfully,
return true. Otherwise, return false.”
## How about renaming the methods to addFileToSharedCache, addArchiveToSharedCache, addFileToSharedCacheAndClasspath?

## Make both new methods private static instead of static private.
# JobID changes might not be required. Use ConverterUtils#toApplicationId? 
# JobImpl
## cleanupSharedCacheResources - nit: I would check for (checksums == null || checksums.length
== 0) and return to save on indentations. 80 chars is already too small.
## cleanupSharedCacheUploadPolicies - javadoc should use block comments. Well, may be a nit.
# JobSubmitter
## Can we do the code moving from JobSumitter to FileUploader (may be, we need a more descriptive
name) to another JIRA and look at that first if needed. Otherwise, it is hard to review the
changes.
## May be, I am misreading the patch. Is this patch hardcoding MR job submission to always
use SharedCache? If yes, we should definitely avoid that. 
# mapred-default.xml: We need a little more fool-proof config. The way the patch currently
is, a typo will lead to unexpected behavior without any warnings.


> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch
>
>
> Implement the necessary changes so that the MapReduce application can leverage the new
YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set
of resources they would like to cache (i.e. jobjar, libjars, archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message