Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 21 Jul 2016 03:46:20 +0000 (UTC)
From: "Haibo Chen (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <JIRA.12749009.1413589769000.91394.1469072780493@Atlassian.JIRA>
In-Reply-To: <JIRA.12749009.1413589769000@Atlassian.JIRA>
References: <JIRA.12749009.1413589769000@Atlassian.JIRA> <JIRA.12749009.1413589769285@arcas>
Subject: [jira] [Resolved] (MAPREDUCE-6131) Integer overflow in
 RMContainerAllocator results in starvation of applications
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 21 Jul 2016 03:46:22 -0000


     [ https://issues.apache.org/jira/browse/MAPREDUCE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Haibo Chen resolved MAPREDUCE-6131.
-----------------------------------
    Resolution: Invalid

> Integer overflow in RMContainerAllocator results in starvation of applications
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6131
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6131
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Kamal Kc
>         Attachments: MAPREDUCE-6131-2.2.0.patch
>
>
> When processing large datasets, Hadoop encounters a scenario where all
>  containers run reduce tasks and no map tasks are scheduled. The 
> application does not fail but rather remains in this state without making 
> any forward progress. It then has to be manually terminated. 
> This bug is due to integer overflow in scheduleReduces() of 
> RMContainerAllocator. The variable netScheduledMapMem overflows for 
> large data sizes, takes negative value, and results in a large 
> finalReduceMemLimit and a large rampup value. In almost all cases, this 
> large rampup value is greater than the total number of reduce tasks. 
> Therefore, the AM tries to assign all reduce tasks. And if the total number 
> of reduce tasks is greater than the total container slots, then all slots are 
> taken up by reduce tasks, leaving none for maps. 
> With 128MB block size and 2GB map container size, overflow occurs with 128 TB data size. An example scenario for the reproduction is: 
> - Input data size of 32TB, block size 128MB, Map container size = 10GB,
> reduce container size = 10GB, #reducers = 50,  cluster mem capacity =  7 x 40GB, slowstart=0.0
> Better resolution might be to change the variables used in 
> RMContainerAllocator from int to long. A simpler fix instead would be to 
> only change the local variables of scheduleReduces() to long data types. 
> Patch is attached for 2.2.0. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org