hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Seeth <sv...@yahoo.com>
Subject RE: Strange behavior - One reduce out of N reduces always fail.
Date Wed, 21 Feb 2007 00:08:39 GMT
Hi Devraj,

The log file for key-value pairs are huge? If you can
tell me what are you looking for I can mine and send
the relevant information.

 343891695 2007-02-20 18:37 seq.log

This time aroung I get the following error:

java.lang.OutOfMemoryError: GC overhead limit exceeded
        at
java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.<init>(String.java:216)
        at
java.lang.StringBuffer.toString(StringBuffer.java:585)
        at
org.apache.log4j.WriterAppender.checkEntryConditions(WriterAppender.java:176)
        at
org.apache.log4j.WriterAppender.append(WriterAppender.java:156)
        at
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
        at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:65)
        at
org.apache.log4j.Category.callAppenders(Category.java:203)
        at
org.apache.log4j.Category.forcedLog(Category.java:388)
        at
org.apache.log4j.Category.debug(Category.java:257)
        at
com.gale.searchng.workflow.model.TuplesWritable.readFields(TuplesWritable.java:127)
        at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:199)
        at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:160)
        at
com.gale.searchng.workflow.indexer.Indexer.reduce(Indexer.java:152)
        at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:324)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1372)

Thanks,
Venkat

--- Devaraj Das <ddas@yahoo-inc.com> wrote:

> Hi Venkat,
> You forgot to paste the log output in your reply.
> The patch that I sent will
> log the key/value sizes in the Reducers as well. See
> if you get helpful
> hints with that.
> Thanks,
> Devaraj.
> 
> > -----Original Message-----
> > From: Venkat Seeth [mailto:svejb@yahoo.com]
> > Sent: Tuesday, February 20, 2007 9:55 PM
> > To: hadoop-user@lucene.apache.org; Devaraj Das
> > Subject: RE: Strange behavior - One reduce out of
> N reduces always fail.
> > 
> > Hi Devraj,
> > 
> > Thanks for your response.
> > 
> > > Do you have an estimate of the sizes?
> > # of entries:1080746
> > [# of field-value Pairs]
> > min count:20
> > max count:3116
> > avg count:66
> > 
> > These are small documents and yes, full-text
> content
> > for each document can be big. I've also set the
> > MaxFieldLength to 10000 so that I dont index very
> > large values as suggested in Lucene.
> > 
> > Always, the reduce fails while merging segments. I
> do
> > see a large line in Log4J output which consists of
> > 
> > Typically, the job that fails is is ALWAYS VERY
> SLOW
> > as compared to other N - 1 jobs.
> > 
> > Can I log the Key-Value pair sizes in the reduce
> part
> > of the indexer?
> > 
> > Again,
> > 
> > Thanks,
> > Venkat
> > 
> > --- Devaraj Das <ddas@yahoo-inc.com> wrote:
> > 
> > > While this could be a JVM/GC issue as Andrez
> pointed
> > > out, it could also be
> > > due to a very large key/value being read from
> the
> > > map output. Do you have an
> > > estimate of the sizes? Attached is a
> > > quick-hack-patch to log the sizes of
> > > the key/values read from the sequence files.
> Please
> > > apply this patch on
> > > hadoop-0.11.2 and check the userlogs what
> key/value
> > > it is failing for (if at
> > > all)..
> > > Thanks,
> > > Devaraj.
> > >
> > > > -----Original Message-----
> > > > From: Venkat Seeth [mailto:svejb@yahoo.com]
> > > > Sent: Tuesday, February 20, 2007 11:32 AM
> > > > To: hadoop-user@lucene.apache.org
> > > > Subject: Strange behavior - One reduce out of
> N
> > > reduces always fail.
> > > >
> > > > Hi there,
> > > >
> > > > Howdy. I've been using hadoop to parse and
> index
> > > XML
> > > > documents. Its a 2 step process similar to
> Nutch.
> > > I
> > > > parse the XML and create field-value tuples
> > > written to
> > > > a file.
> > > >
> > > > I read this file and index the field-value
> pairs
> > > in
> > > > the next step.
> > > >
> > > > Everything works fine but always one reduce
> out of
> > > N
> > > > fails in the last step when merging segments.
> It
> > > fails
> > > > with one or more of the following:
> > > > - Task failed to report status for 608
> seconds.
> > > > Killing.
> > > > - java.lang.OutOfMemoryError: GC overhead
> limit
> > > > exceeded
> > > >
> > > > I've tried various configuration combinations
> and
> > > it
> > > > fails always at the 4th one in a 8 reduce
> > > > configuration and the first one in a 4 reduce
> > > config.
> > > >
> > > > Environment:
> > > > Suse Linux 64 bit
> > > > Java 6 (Java 5 also fails)
> > > > Hadoop-0.11-2
> > > > Lucene-2.1 (Lucene 2.0 also fails)
> > > >
> > > > Configuration:
> > > > I have about 128 maps and 8 reduces so I get
> to
> > > create
> > > > 8 partitions of my index. It runs on a 4 node
> > > cluster
> > > > with 4-Dual-proc 64GB machines.
> > > >
> > > > Number of documents: 1.65 million each about
> 10K
> > > in
> > > > size.
> > > >
> > > > I ran with 4 or 8 task trackers per node with
> 4 GB
> > > > Heap for Job, Task trackers and the child
> JVMs.
> > > >
> > > > mergeFactor set to 50 and maxBufferedDocs at
> 1000.
> > > >
> > > > I fail to understand whats going on. When I
> run
> > > the
> > > > job individually, it works with the same
> settings.
> > > >
> > > > Why would all jobs work where in only one
> fails.
> > > >
> > > > I'd appreciate if any one can share their
> > > experience.
> > > >
> > > > Thanks,
> > > > Ven
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>
__________________________________________________________________________
> > > > __________
> > > > Yahoo! Music Unlimited
> > > > Access over 1 million songs.
> > > > http://music.yahoo.com/unlimited
> > >
> > 
> > 
> > 
> > 
> >
>
__________________________________________________________________________
> > __________
> > Cheap talk?
> > Check out Yahoo! Messenger's low PC-to-Phone call
> rates.
> > http://voice.yahoo.com
> 
> 



 
____________________________________________________________________________________
Yahoo! Music Unlimited
Access over 1 million songs.
http://music.yahoo.com/unlimited

Mime
View raw message