hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Strange behavior - One reduce out of N reduces always fail.
Date Tue, 20 Feb 2007 07:30:01 GMT
Venkat Seeth wrote:
> Hi there,
>
> Howdy. I've been using hadoop to parse and index XML
> documents. Its a 2 step process similar to Nutch. I
> parse the XML and create field-value tuples written to
> a file.
>
> I read this file and index the field-value pairs in
> the next step.
>
> Everything works fine but always one reduce out of N
> fails in the last step when merging segments. It fails
> with one or more of the following:
> - Task failed to report status for 608 seconds.
> Killing. 
> - java.lang.OutOfMemoryError: GC overhead limit
> exceeded 
>   

Perhaps you are running with too large heap, as strange as it may sound 
... If I understand this message correctly, JVM complains that GC is 
taking too much resources.

This may be also related to ulimit on this account ...


> Configuration:
> I have about 128 maps and 8 reduces so I get to create
> 8 partitions of my index. It runs on a 4 node cluster
> with 4-Dual-proc 64GB machines.
>   

I think that with this configuration you could increase the number of 
reduces, to decrease the amount of data each reduce task has to handle. 
In your current config you run at most 2 reduces per machine.

> Number of documents: 1.65 million each about 10K in
> size.
>
> I ran with 4 or 8 task trackers per node with 4 GB
> Heap for Job, Task trackers and the child JVMs.
>
> mergeFactor set to 50 and maxBufferedDocs at 1000.
>
> I fail to understand whats going on. When I run the
> job individually, it works with the same settings.
>
> Why would all jobs work where in only one fails.
>   

You can also use IsolationRunner to re-run individual tasks under 
debugger and see where they fail.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message