hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
Date Wed, 11 Jan 2017 05:13:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817206#comment-15817206

Sangjin Lee commented on MAPREDUCE-5951:

I went over the patch with Chris in some detail today, and am posting the review comments
here for the record.

- {{mapreduce.job.jobjar.visibility}} and {{mapreduce.job.jobjar.sharedcache.uploadpolicy}}
are computed values that are not user-facing; in that case we should not even define the defaults
so that there is no confusion that these values are computed

- l.159-160: I understand lines 161-165 are there to support programmatic use cases of the
distributed cache that come in outside of the job submitter code path. Can we make the comments
clearer so that the intent of this comes through? We could also annotate them in {{MRJobConfig}}.
- l.171-172: it might be slightly better to use {{LinkedHashMap}}. That way, we'd have a predictable
iteration order (the order in which they are specified in the user values).
- l.219: To be fair, this is a bug we need to fix, right? Then can we file a JIRA and add
the JIRA id here?
- l.237-240: I would not worry about handling previous values here. Having duplicate paths
is not really supported and the worst case scenario here is to reset this upload policy with
the same value.
- l.260-263: I think we can improve on this, and reconcile the shared cache with the wildcard
feature. We could see if any resource is uploaded to the staging directory, and if so, still
preserve the wildcard entry. We also need to consider the case where the shared cache is disabled
but the wildcard is enabled.
- l.291-294: same comment as above
- l.348-351: same comment as above
- l.388: Since we're dealing with a local filesystem URI based on l.381-382, the authority
check is not meaningful. We should remove this check.
- On a larger note, the path/URI handling between the job jar, libjars, files, and archives
is not very consistent, which is an existing behavior. We need to see if they need to get
the same consistent treatment for this to work.

- l.1446: I realized later that passing an empty map has an effect of nulling out the config
value; perhaps we could make that more explicit in the javadoc and/or comments/code?
- l.1449-1463: nit: it might be slightly easier to read by using a simple string concatenation
with "+" (JVM internally uses the {{StringBuilder}})
- l.1490: here also it might be better to use {{LinkedHashMap}}

> Add support for the YARN Shared Cache
> -------------------------------------
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch,
MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, MAPREDUCE-5951-trunk-v14.patch,
MAPREDUCE-5951-trunk-v15.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch,
MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch,
MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, MAPREDUCE-5951-trunk-v9.patch,
MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, MAPREDUCE-5951-trunk.018.patch
> Implement the necessary changes so that the MapReduce application can leverage the new
YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set
of resources they would like to cache (i.e. jobjar, libjars, archives, files).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message