hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2263) CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for nested MapReduce jobs
Date Thu, 10 Jul 2014 23:55:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058132#comment-14058132
] 

Jason Lowe commented on YARN-2263:
----------------------------------

1 is an appropriate lower bound since we don't ever want the maximum number of applications
for a user to be zero or less.  (That would be a worthless queue since we could submit jobs
to it but no jobs would activate.) 

I'm assuming it only causes a deadlock in the case where the active job submits and waits
for the completion of other jobs?  If it simply submits jobs and exits then even if the queue
is so tiny that only 1 active job per user is allowed then the jobs should eventually complete
(assuming sufficient resources to launch an AM _and_ at least one task simultaneously if this
is MapReduce).

If the concern is that the queue can be too small to allow running more than one application
simultaneously for a user and some app frameworks might not like that, then yes that could
be an issue.  However I'm not sure that is YARN's problem to solve.  I could have an application
framework that for whatever reason requires 10 jobs to be running simultaneously to work.
 There could definitely be a queue config that will not allow that to run properly because
the queue is too small to support 10 simultaneous applications by a single user.  Should YARN
handle this scenario?  If so, how would it detect it, and what should it do to mitigate it?
 I would argue the same applies to the simpler job-launching-job-and-waiting scenario.  Some
queues are going to be too small to support that.

Users can work around issues like this with smarter queue setups.  This is touched upon in
MAPREDUCE-4304 and elsewhere for the Oozie case which is a similar scenario.  We can setup
a separate queue for the launcher jobs separate from a queue where the other jobs run.  That
way we can't accidentally fill the cluster/queue with just launcher jobs and deadlock.

> CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for nested MapReduce
jobs
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-2263
>                 URL: https://issues.apache.org/jira/browse/YARN-2263
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 0.23.10, 2.4.1
>            Reporter: Chen He
>
> computeMaxActiveApplicationsPerUser() has a lower bound "1". For a nested MapReduce job
which files new mapreduce jobs in its mapper/reducer, it will cause job stuck.
> public static int computeMaxActiveApplicationsPerUser(
>       int maxActiveApplications, int userLimit, float userLimitFactor) {
>     return Math.max(
>         (int)Math.ceil(
>             maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),
>         1);
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message