hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: How are records with equal key sorted in hadoop-0.18?
Date Mon, 08 Dec 2008 10:15:39 GMT
Hi Christian, there is no notable change to the merge algorithm except that
it uses IFile instead of SequenceFile for the input and output.
Is your application running with intermediate compression on? What's the
value configured for fs.inmemory.size.mb? What is the typical map output
size (if you happen to know)?

Devaraj


On 12/8/08 12:59 PM, "Christian Kunz" <ckunz@yahoo-inc.com> wrote:

> Since running with hadoop-0.18 we have many more problems with running out
> of memory during the final merge process in the reduce phase, especially
> when dealing with a lot of records with the same key.
> 
> Typical exception:
> java.lang.OutOfMemoryError: Java heap space
>     at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:278)
>     at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:340)
>     at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:134)
>     at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:2
> 25)
>     at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242)
>     at 
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:720)
>     at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:679)
>     at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.jav
> a:227)
>     at 
> org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:60)
>     at 
> org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:36)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
>     at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> 
> This did not occur in earlier releases although we used a much larger fan
> factor io.sort.factor (500+ versus currently just 100). Also tasks are run
> with 2GB of heap space.
> 
> What changed in the merge algorithm between hadoop-0.17 and hadoop-0.18?
> 
> Are records with same key getting sorted by size for some reason? This would
> cause large values to be merged at the same time.
> 
> Thanks,
> Christian
> 



Mime
View raw message