hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Seeth <sv...@yahoo.com>
Subject RE: Strange behavior - One reduce out of N reduces always fail.
Date Tue, 20 Feb 2007 16:25:25 GMT
Hi Devraj,

Thanks for your response.

> Do you have an estimate of the sizes? 
# of entries:1080746
[# of field-value Pairs]
min count:20
max count:3116
avg count:66

These are small documents and yes, full-text content
for each document can be big. I've also set the
MaxFieldLength to 10000 so that I dont index very
large values as suggested in Lucene.

Always, the reduce fails while merging segments. I do
see a large line in Log4J output which consists of 

Typically, the job that fails is is ALWAYS VERY SLOW
as compared to other N - 1 jobs. 

Can I log the Key-Value pair sizes in the reduce part
of the indexer?

Again, 

Thanks,
Venkat

--- Devaraj Das <ddas@yahoo-inc.com> wrote:

> While this could be a JVM/GC issue as Andrez pointed
> out, it could also be
> due to a very large key/value being read from the
> map output. Do you have an
> estimate of the sizes? Attached is a
> quick-hack-patch to log the sizes of
> the key/values read from the sequence files. Please
> apply this patch on
> hadoop-0.11.2 and check the userlogs what key/value
> it is failing for (if at
> all)..
> Thanks,
> Devaraj.
> 
> > -----Original Message-----
> > From: Venkat Seeth [mailto:svejb@yahoo.com]
> > Sent: Tuesday, February 20, 2007 11:32 AM
> > To: hadoop-user@lucene.apache.org
> > Subject: Strange behavior - One reduce out of N
> reduces always fail.
> > 
> > Hi there,
> > 
> > Howdy. I've been using hadoop to parse and index
> XML
> > documents. Its a 2 step process similar to Nutch.
> I
> > parse the XML and create field-value tuples
> written to
> > a file.
> > 
> > I read this file and index the field-value pairs
> in
> > the next step.
> > 
> > Everything works fine but always one reduce out of
> N
> > fails in the last step when merging segments. It
> fails
> > with one or more of the following:
> > - Task failed to report status for 608 seconds.
> > Killing.
> > - java.lang.OutOfMemoryError: GC overhead limit
> > exceeded
> > 
> > I've tried various configuration combinations and
> it
> > fails always at the 4th one in a 8 reduce
> > configuration and the first one in a 4 reduce
> config.
> > 
> > Environment:
> > Suse Linux 64 bit
> > Java 6 (Java 5 also fails)
> > Hadoop-0.11-2
> > Lucene-2.1 (Lucene 2.0 also fails)
> > 
> > Configuration:
> > I have about 128 maps and 8 reduces so I get to
> create
> > 8 partitions of my index. It runs on a 4 node
> cluster
> > with 4-Dual-proc 64GB machines.
> > 
> > Number of documents: 1.65 million each about 10K
> in
> > size.
> > 
> > I ran with 4 or 8 task trackers per node with 4 GB
> > Heap for Job, Task trackers and the child JVMs.
> > 
> > mergeFactor set to 50 and maxBufferedDocs at 1000.
> > 
> > I fail to understand whats going on. When I run
> the
> > job individually, it works with the same settings.
> > 
> > Why would all jobs work where in only one fails.
> > 
> > I'd appreciate if any one can share their
> experience.
> > 
> > Thanks,
> > Ven
> > 
> > 
> > 
> > 
> >
>
__________________________________________________________________________
> > __________
> > Yahoo! Music Unlimited
> > Access over 1 million songs.
> > http://music.yahoo.com/unlimited
> 



 
____________________________________________________________________________________
Cheap talk?
Check out Yahoo! Messenger's low PC-to-Phone call rates.
http://voice.yahoo.com

Mime
View raw message