hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Map Reduce Theory Question, getting OutOfMemoryError while reducing
Date Sat, 30 Jun 2012 04:40:19 GMT
Guojun is right, the reduce() inputs are buffered and read off of disk. You
are in no danger there.

On Fri, Jun 29, 2012 at 11:02 PM, GUOJUN Zhu <guojun_zhu@freddiemac.com>wrote:

> If you are referring the iterable in the reducer, they are special and not
> in the memory at all.  Once the iterator pass a value, it is lost and you
> cannot recover it.  There is nothing like linkedlist in behind.
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> guojun_zhu@freddiemac.com
> Financial Engineering
> Freddie Mac
>     *"Berry, Matt" <mwberry@amazon.com>*
>    06/29/2012 01:06 PM
>     Please respond to
> mapreduce-user@hadoop.apache.org
>   To
> "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>
> cc
>   Subject
> RE: Map Reduce Theory Question, getting OutOfMemoryError while reducing
> I was actually quite curious as to how Hadoop was managing to get all of
> the records into the Iterable in the first place. I thought they were using
> a very specialized object that implements Iterable, but a heap dump shows
> they're likely  just using a LinkedList. All I was doing was duplicating
> that object. Supposing I do as you suggest, am I in danger of having their
> list consume all the memory if a user decides to log 2x or 3x as much as
> they did this time?
> ~Matt
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, June 29, 2012 6:52 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Map Reduce Theory Question, getting OutOfMemoryError while
> reducing
> Hey Matt,
> As far as I can tell, Hadoop isn't at fault here truly.
> If your issue is that you collect in a list before you store, you should
> focus on that and just avoid collecting it completely. Why don't you
> serialize as you receive, if the incoming order is already taken care of?
> As far as I can tell, your AggregateRecords probably does nothing else but
> serialize the stored LinkedList. So instead of using a LinkedList, or even
> a composed Writable such as AggregateRecords, just write them in as you
> receive them via each .next(). Would this not work for you? You may batch a
> constant bit to gain some write performance but at least you won't have to
> use up your memory.
> You can serialize as you receive by following this:
> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
> --
> Harsh J
> <http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F>

Harsh J

View raw message