Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <16049312.1159816460804.JavaMail.root@brutus>
Date: Mon, 2 Oct 2006 12:14:20 -0700 (PDT)
From: "Doug Cutting (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Resolved: (HADOOP-570) Map tasks may fail due to out of
 memory, if the number of reducers are moderately big
In-Reply-To: <21081187.1159810699740.JavaMail.root@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

     [ http://issues.apache.org/jira/browse/HADOOP-570?page=all ]

Doug Cutting resolved HADOOP-570.
---------------------------------

    Resolution: Duplicate

This is a duplicate of HADOOP-331.

> Map tasks may fail due to out of memory, if the number of reducers are moderately big
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-570
>                 URL: http://issues.apache.org/jira/browse/HADOOP-570
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> Map tasks may fail due to out of memory, if the number of reducers are moderately big. 
> In my case, I set child task heap size to 1GB, turned on compression for the mapoutput files. 
> The average size of input records is about 30K (I don't know the variation though). 
> A lot of map tasks failed due to out of memory when the number of reducers was at 400 and higher.
> The number of reducers can be somewhat higher (as high as 800) if the compression for the mapoutput files was off).
> This problem will impose a hard limit on the scalability of map/reduce clusters.
> One possible solution to this problem is to let the mapper to write out single map output file, 
> and then to perform sort/partition as a separate phrase. 
> his will also make it unnecessary for  the reducers to perform sort on individual portions from mappers. 
> Rather, the reducers should just perform merge operations on the map output files directly. 
> This may even allow the possibility of dynamically collect some statistics of  the map outputs and 
> use the stats to drive the partition on the mapper side, and obtain the optimal merge plan on the reducer side!
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira