Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23A13D238 for ; Tue, 18 Dec 2012 20:20:14 +0000 (UTC) Received: (qmail 9991 invoked by uid 500); 18 Dec 2012 20:20:13 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 9959 invoked by uid 500); 18 Dec 2012 20:20:13 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 9929 invoked by uid 99); 18 Dec 2012 20:20:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2012 20:20:13 +0000 Date: Tue, 18 Dec 2012 20:20:13 +0000 (UTC) From: "Mariappan Asokan (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4842: ---------------------------------------- Attachment: mapreduce-4842.patch I updated patch. All the changes are in {{MergeManager.}} Here is the outline of changes: * Eliminated the line {code} commitMemory -= size; {code} in {{unreserve()}} method. Rationale: The complementary method {{reserve()}} only increments {{usedMemory}} not {{commitMemory.}} Besides, {{commitMemory}} is used only to decide when we have enough shuffled map outputs in memory to trigger an in-memory merge. * In {{closeInMemoryFile(),}} once an in-memory merge is submitted, {{commitMemory}} is set back to 0. Rationale: If any fetcher thread sneaks in(past the in-memory merge's wait because in-memory merge has not started yet), it will be allowed to shuffle data to memory if memory was freed by the in-memory merger. The value of {{commitMemory}} will be incremented from 0 so that another merge will not be triggered unless the number of bytes of data shuffled by sneaked-in threads is greater than or equal to {{mergeThreshold.}} This will make sure that we do not start a merge prematurely. * Added initialization of {{usedMemory}} and {{commitMemory}} in the constructor(though this is not needed as the java constructor zeros out these by default.) Please test this patch for any performance regression. Thanks. -- Asokan > Shuffle race can hang reducer > ----------------------------- > > Key: MAPREDUCE-4842 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.0.2-alpha, 0.23.5 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Blocker > Attachments: mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch > > > Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira