hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
Date Wed, 08 Jun 2016 19:44:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321316#comment-15321316

Daniel Templeton commented on MAPREDUCE-6690:

Thanks for the clarification, [~jlowe].  I assumed that YARN-5192 would implement the check
as part of the submit call so that the client gets immediate feedback.  The point that I forgot
about, though, is that regardless the submit only happens after the resources have been uploaded
to HDFS.  Given that this check specifically targets wide loads, the cases where the server-side
check would reject the submit are exactly the ones that will waste the most time with the

I now see the light.  I would like to find a way, however, to try to keep the two settings
in sync if possible.  I've seen cases, such as the number of concurrent moves in the HDFS
mover, where the limit is set on both the client and server sides, and it ends up confusing
customers.  What about having the RM offer up its resource limits through a call?  The client
could then query the RM's limits and apply those.

> Limit the number of resources a single map reduce job can submit for localization
> ---------------------------------------------------------------------------------
>                 Key: MAPREDUCE-6690
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-6690-trunk-v1.patch, MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch
> Users will sometimes submit a large amount of resources to be localized as part of a
single map reduce job. This can cause issues with YARN localization that destabilize the cluster
and potentially impact other user jobs. These resources are specified via the files, libjars,
archives and jobjar command line arguments or directly through the configuration (i.e. distributed
cache api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the option of enforcing
resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the server side,
but this jira is only covering the map reduce layer on the client side. In practice, having
these client side limits will get us a long way towards preventing these localization anti-patterns.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message