Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6F109C8A for ; Fri, 12 Oct 2012 21:25:05 +0000 (UTC) Received: (qmail 42668 invoked by uid 500); 12 Oct 2012 21:25:05 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 42623 invoked by uid 500); 12 Oct 2012 21:25:05 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 42415 invoked by uid 99); 12 Oct 2012 21:25:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2012 21:25:05 +0000 Date: Fri, 12 Oct 2012 21:25:05 +0000 (UTC) From: "Alejandro Abdelnur (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1258566018.39834.1350077105203.JavaMail.jiratomcat@arcas> In-Reply-To: <1001210752.28403.1338586404586.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (MAPREDUCE-4304) Deadlock where all containers are held by ApplicationMasters should be prevented MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475367#comment-13475367 ] Alejandro Abdelnur commented on MAPREDUCE-4304: ----------------------------------------------- A way to avoid deadlocks is to use a dedicated queue for the oozie action launcher jobs, different that the queue used for the oozie action jobs themselves. Also, launcher jobs could be configured to use little memory for the case of map-reduce actions (the launchers only submit the real job). For other action types, ie pig & hive, the launcher runs the actual client invocation (pig or hive), so launcher memory may have to higher than for map-reduce launcher action. A more longer term approach (in combination with the above) would be to use a OozieLauncherAM thus reducing the number of containers for each launchers from 2 to 1. > Deadlock where all containers are held by ApplicationMasters should be prevented > -------------------------------------------------------------------------------- > > Key: MAPREDUCE-4304 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4304 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2, resourcemanager > Affects Versions: 0.23.1 > Reporter: Herman Chen > > In my test cluster with 4 NodeManagers, each with only ~1.6G container memory, when a burst of jobs, e.g. >10, are concurrently submitted, it is likely that 4 jobs are accepted, with 4 ApplicationMasters allocated, but then the jobs block each other indefinitely because they're all waiting to allocate more containers. > Note that the problem is not limited to tiny cluster like this. As long as the number of jobs being submitted is greater than the rate jobs finish, it may run into a vicious cycle where more and more containers are locked up by ApplicationMasters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira