hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Date Mon, 27 Sep 2010 01:01:46 GMT
One of the problems with this data set is that I'm grouping by a
category that has only, say, 20 different values. Then I'm doing a
unique count of Facebook user IDs per group. I imagine that's not
pleasant for the reducers.

On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <alexvk@cloudera.com> wrote:
> Hi Bradford,
>
> Sometimes the reducers do not handle merging large chunks of data too well:
> How many reducers do you have?  Try to increase the # of reducers (you can
> always merge the data later if you are worried about too many output files).
>
> --
> Alex Kozlov
> Solutions Architect
> Cloudera, Inc
> twitter: alexvk2009
>
> Hadoop World 2010, October 12, New York City - Register now:
> http://www.cloudera.com/company/press-center/hadoop-world-nyc/
>
>
> On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <chris@wensel.net> wrote:
>
>> Try using a lower threshold value (the num of values in the LRU to cache).
>> this is the tradeoff of this approach.
>>
>> ckw
>>
>> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
>>
>> > Sadly, making Chris's changes didn't help.
>> >
>> > Here's the Cascading code, it's pretty simple but uses the new
>> > "combiner"-like functionality:
>> >
>> > http://pastebin.com/ccvDmLSX
>> >
>> >
>> >
>> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>> >> My feeling is that you have some kind of leak going on in your mappers
>> or
>> >> reducers and that reducing the number of times the jvm is re-used would
>> >> improve matters.
>> >>
>> >> GC overhead limit indicates that your (tiny) heap is full and collection
>> is
>> >> not reducing that.
>> >>
>> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
>> >> bradfordstephens@gmail.com> wrote:
>> >>
>> >>> mapred.job.reuse.jvm.num.tasks=50
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Bradford Stephens,
>> > Founder, Drawn to Scale
>> > drawntoscalehq.com
>> > 727.697.7528
>> >
>> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> > solution. Process, store, query, search, and serve all your data.
>> >
>> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> > Media, and Computer Science
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> "cascading-user" group.
>> > To post to this group, send email to cascading-user@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> cascading-user+unsubscribe@googlegroups.com<cascading-user%2Bunsubscribe@googlegroups.com>
>> .
>> > For more options, visit this group at
>> http://groups.google.com/group/cascading-user?hl=en.
>> >
>>
>> --
>> Chris K Wensel
>> chris@concurrentinc.com
>> http://www.concurrentinc.com
>>
>> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
>>
>>
>



-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Mime
View raw message