hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: Mappers crashing due to running out of heap space during initialisation
Date Wed, 27 Apr 2011 11:47:30 GMT
It was initializing a 200MB buffer to do the sorting of the output in.
How much space did you allocate the task JVMs (mapred.child.java.opts
in mapred-site.xml)?

If you didn't change the default, it's set to 200MB which is why you
would run out of error trying to allocate a 200MB buffer.

-Joey

On Wed, Apr 27, 2011 at 6:02 AM, James Hammerton
<james.hammerton@mendeley.com> wrote:
> Hi,
>
> I lowered the io.sort.mb to 100mb from 200mb and that allowed my job to get
> through the mapping phase, thanks Chris.
>
> However what I don't understand is why the memory got used up in the first
> place when the mapper only buffers the previous input and the maximum
> serialised size of the objects it's dealing with is 201k.
>
> This is why I asked about what Hadoop is doing in the area of code where the
> exception was occurring - as far as I can tell, my mapper code wasn't even
> getting run.
>
> Regards,
>
> James
>
> On Tue, Apr 26, 2011 at 8:02 PM, Chris Douglas <cdouglas@apache.org> wrote:
>>
>> Lower io.sort.mb or raise the heap size for the task. -C
>>
>> On Tue, Apr 26, 2011 at 10:55 AM, James Hammerton
>> <james.hammerton@mendeley.com> wrote:
>> > Hi,
>> >
>> > I have a job that runs fine with a small data set in pseudo-distributed
>> > mode
>> > on my desktop workstation but when I run it on our Hadoop cluster it
>> > falls
>> > over, crashing during the initialisation of some of the mappers. The
>> > errors
>> > look like this:
>> >
>> > 2011-04-26 14:34:04,494 FATAL org.apache.hadoop.mapred.TaskTracker:
>> > Error
>> > running child : java.lang.OutOfMemoryError: Java heap space
>> >       at
>> >
>> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:743)
>> >
>> >       at
>> >
>> > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
>> >       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
>> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >
>> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >
>> > The mapper itself buffers only the previous input and the objects are
>> > small
>> > (max 201K in size, most under 50k), so I don't know why this is
>> > happening.
>> >
>> > What exactly is happening in the area of code referred to in the stack
>> > trace?
>> >
>> > Cheers,
>> >
>> > James
>> >
>> > --
>> > James Hammerton | Senior Data Mining Engineer
>> > www.mendeley.com/profiles/james-hammerton
>> >
>> > Mendeley Limited | London, UK | www.mendeley.com
>> > Registered in England and Wales | Company Number 6419015
>> >
>> >
>> >
>> >
>
>
>
> --
> James Hammerton | Senior Data Mining Engineer
> www.mendeley.com/profiles/james-hammerton
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>
>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message