hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leon Mergen <l.p.mer...@solatis.com>
Subject RE: OutOfMemoryError with map jobs
Date Sun, 07 Sep 2008 11:22:53 GMT
Hello Chris,

>  From the stack trace you provided, your OOM is probably due to
> HADOOP-3931, which is fixed in 0.17.2. It occurs when the deserialized
> key in an outputted record exactly fills the serialization buffer that
> collects map outputs, causing an allocation as large as the size of
> that buffer. It causes an extra spill, an OOM exception if the task
> JVM has a max heap size too small to mask the bug, and will miss the
> combiner if you've defined one, but it won't drop records.

Ok thanks for that information. I guess that means I will have to upgrade. :-)

> > However, I was wondering: are these hard architectural limits? Say
> > that I wanted to emit 25,000 maps for a single input record, would
> > that mean that I will require huge amounts of (virtual) memory? In
> > other words, what exactly is the reason that increasing the number
> > of emitted maps per input record causes an OutOfMemoryError ?
> Do you mean the number of output records per input record in the map?
> The memory allocated for collecting records out of the map is (mostly)
> fixed at the size defined in io.sort.mb. The ratio of input records to
> output records does not affect the collection and sort. The number of
> output records can sometimes influence the memory requirements, but
> not significantly. -C

Ok, so I should not have to worry about this too much! Thanks for the reply and information!


Leon Mergen

View raw message