hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Seeth <sv...@yahoo.com>
Subject RE: Strange behavior - One reduce out of N reduces always fail.
Date Wed, 21 Feb 2007 05:20:14 GMT
Havent determined which Key-Value pair causes this
one. Need to find that out.

Thanks,
Venkat

--- Mahadev Konar <mahadev@yahoo-inc.com> wrote:

> It does look like the value for a particular key is
> huge in size. Does your
> map reduce job fail for the same key/value pair or
> is it non deterministic?
> 
> Regards
> Mahadev
> 
> > -----Original Message-----
> > From: Venkat Seeth [mailto:svejb@yahoo.com]
> > Sent: Tuesday, February 20, 2007 4:09 PM
> > To: hadoop-user@lucene.apache.org; Devaraj Das
> > Subject: RE: Strange behavior - One reduce out of
> N reduces always fail.
> > 
> > Hi Devraj,
> > 
> > The log file for key-value pairs are huge? If you
> can
> > tell me what are you looking for I can mine and
> send
> > the relevant information.
> > 
> >  343891695 2007-02-20 18:37 seq.log
> > 
> > This time aroung I get the following error:
> > 
> > java.lang.OutOfMemoryError: GC overhead limit
> exceeded
> >         at
> > java.util.Arrays.copyOfRange(Arrays.java:3209)
> >         at
> java.lang.String.<init>(String.java:216)
> >         at
> >
>
java.lang.StringBuffer.toString(StringBuffer.java:585)
> >         at
> >
>
org.apache.log4j.WriterAppender.checkEntryConditions(WriterAppender.java:1
> > 76)
> >         at
> >
>
org.apache.log4j.WriterAppender.append(WriterAppender.java:156)
> >         at
> >
>
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
> >         at
> >
>
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(Appe
> > nderAttachableImpl.java:65)
> >         at
> >
>
org.apache.log4j.Category.callAppenders(Category.java:203)
> >         at
> >
>
org.apache.log4j.Category.forcedLog(Category.java:388)
> >         at
> > org.apache.log4j.Category.debug(Category.java:257)
> >         at
> >
>
com.gale.searchng.workflow.model.TuplesWritable.readFields(TuplesWritable.
> > java:127)
> >         at
> >
>
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java
> > :199)
> >         at
> >
>
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:16
> > 0)
> >         at
> >
>
com.gale.searchng.workflow.indexer.Indexer.reduce(Indexer.java:152)
> >         at
> >
>
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:324)
> >         at
> >
>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1372)
> > 
> > Thanks,
> > Venkat
> > 
> > --- Devaraj Das <ddas@yahoo-inc.com> wrote:
> > 
> > > Hi Venkat,
> > > You forgot to paste the log output in your
> reply.
> > > The patch that I sent will
> > > log the key/value sizes in the Reducers as well.
> See
> > > if you get helpful
> > > hints with that.
> > > Thanks,
> > > Devaraj.
> > >
> > > > -----Original Message-----
> > > > From: Venkat Seeth [mailto:svejb@yahoo.com]
> > > > Sent: Tuesday, February 20, 2007 9:55 PM
> > > > To: hadoop-user@lucene.apache.org; Devaraj Das
> > > > Subject: RE: Strange behavior - One reduce out
> of
> > > N reduces always fail.
> > > >
> > > > Hi Devraj,
> > > >
> > > > Thanks for your response.
> > > >
> > > > > Do you have an estimate of the sizes?
> > > > # of entries:1080746
> > > > [# of field-value Pairs]
> > > > min count:20
> > > > max count:3116
> > > > avg count:66
> > > >
> > > > These are small documents and yes, full-text
> > > content
> > > > for each document can be big. I've also set
> the
> > > > MaxFieldLength to 10000 so that I dont index
> very
> > > > large values as suggested in Lucene.
> > > >
> > > > Always, the reduce fails while merging
> segments. I
> > > do
> > > > see a large line in Log4J output which
> consists of
> > > >
> > > > Typically, the job that fails is is ALWAYS
> VERY
> > > SLOW
> > > > as compared to other N - 1 jobs.
> > > >
> > > > Can I log the Key-Value pair sizes in the
> reduce
> > > part
> > > > of the indexer?
> > > >
> > > > Again,
> > > >
> > > > Thanks,
> > > > Venkat
> > > >
> > > > --- Devaraj Das <ddas@yahoo-inc.com> wrote:
> > > >
> > > > > While this could be a JVM/GC issue as Andrez
> > > pointed
> > > > > out, it could also be
> > > > > due to a very large key/value being read
> from
> > > the
> > > > > map output. Do you have an
> > > > > estimate of the sizes? Attached is a
> > > > > quick-hack-patch to log the sizes of
> > > > > the key/values read from the sequence files.
> > > Please
> > > > > apply this patch on
> > > > > hadoop-0.11.2 and check the userlogs what
> > > key/value
> > > > > it is failing for (if at
> > > > > all)..
> > > > > Thanks,
> > > > > Devaraj.
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Venkat Seeth
> [mailto:svejb@yahoo.com]
> > > > > > Sent: Tuesday, February 20, 2007 11:32 AM
> > > > > > To: hadoop-user@lucene.apache.org
> > > > > > Subject: Strange behavior - One reduce out
> of
> > > N
> > > > > reduces always fail.
> > > > > >
> > > > > > Hi there,
> > > > > >
> > > > > > Howdy. I've been using hadoop to parse and
> > > index
> > > > > XML
> > > > > > documents. Its a 2 step process similar to
> > > Nutch.
> > > > > I
> > > > > > parse the XML and create field-value
> tuples
> > > > > written to
> > > > > > a file.
> > > > > >
> > > > > > I read this file and index the field-value
> > > pairs
> > > > > in
> > > > > > the next step.
> > > > > >
> > > > > > Everything works fine but always one
> reduce
> > > out of
> > > > > N
> > > > > > fails in the last step when merging
> segments.
> > > It
> > > > > fails
> 
=== message truncated ===



 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

Mime
View raw message