hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
Date Wed, 08 Jun 2016 19:58:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321330#comment-15321330

Jason Lowe commented on MAPREDUCE-6690:

bq. I assumed that YARN-5192 would implement the check as part of the submit call so that
the client gets immediate feedback.

Note that YARN-5192 cannot do the check on application submit.  An application submit only
requires the resources necessary to get the ApplicationMaster localized.  Subsequent containers
for the application could have a completely different set of resources, and they won't be
available in the application submission context for validation at submit time.  MapReduce
is an app framework that happens to localize all resources for all containers, but other application
frameworks do not always do this.

bq.  I would like to find a way, however, to try to keep the two settings in sync if possible.

Agreed it would be annoying for admins to have to keep these in sync, assuming nobody would
ever want to configure the YARN limit higher than the MapReduce limit.

bq. What about having the RM offer up its resource limits through a call?

That would be one way to tackle it.  There have been cases in the past where it would have
been nice for clients to be able to query config settings via the central daemons (i.e.: namenode,
resourcemanager, etc.) rather than assume the local settings in hdfs-site.xml or yarn-site.xml
are the same as what the central daemon is using.  That's a somewhat open-ended API change
for YARN with backwards-compatibility concerns going forward, but maybe it's time we hammered
out whether or not we're going to do it on a YARN JIRA and if not, what clients/users are
supposed to do to better keep the client and the server in sync.

> Limit the number of resources a single map reduce job can submit for localization
> ---------------------------------------------------------------------------------
>                 Key: MAPREDUCE-6690
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-6690-trunk-v1.patch, MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch
> Users will sometimes submit a large amount of resources to be localized as part of a
single map reduce job. This can cause issues with YARN localization that destabilize the cluster
and potentially impact other user jobs. These resources are specified via the files, libjars,
archives and jobjar command line arguments or directly through the configuration (i.e. distributed
cache api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the option of enforcing
resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the server side,
but this jira is only covering the map reduce layer on the client side. In practice, having
these client side limits will get us a long way towards preventing these localization anti-patterns.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message