Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C363BDC0F for ; Wed, 11 Jul 2012 01:13:35 +0000 (UTC) Received: (qmail 7355 invoked by uid 500); 11 Jul 2012 01:13:35 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 7274 invoked by uid 500); 11 Jul 2012 01:13:35 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 7256 invoked by uid 99); 11 Jul 2012 01:13:35 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jul 2012 01:13:35 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 449D7141826 for ; Wed, 11 Jul 2012 01:13:35 +0000 (UTC) Date: Wed, 11 Jul 2012 01:13:35 +0000 (UTC) From: "Tom White (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <71251560.33020.1341969215283.JavaMail.jiratomcat@issues-vm> In-Reply-To: <433790628.23777.1338496943296.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Updated] (MAPREDUCE-4299) Terasort hangs with MR2 FifoScheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-4299: --------------------------------- Attachment: MAPREDUCE-4299.patch The problem is that FifoScheduler always sets the application headroom to be the entire set of cluster resources, without taking into account any containers that have been assigned. In some cases, like the terasort case mentioned in the JIRA, this leads to the reducer tasks using all the cluster resources before the map tasks have finished, resulting in deadlock. Attached is a fix with a unit test. > Terasort hangs with MR2 FifoScheduler > ------------------------------------- > > Key: MAPREDUCE-4299 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4299 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.0.0-alpha > Reporter: Tom White > Attachments: MAPREDUCE-4299.patch > > > What happens is that the number of reducers ramp up until they occupy all of the job's containers, at which point the maps no longer make any progress and the job hangs. > When the same job is run with the CapacityScheduler it succeeds, so this looks like a FifoScheduler bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira