hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Freas" <colinfr...@gmail.com>
Subject Re: Stack Overflow When Running Job
Date Tue, 10 Jun 2008 22:32:49 GMT
We keep running into this problem.  I've checked out the latest trunk,
applied the patch, and rebuilt the tar.gz.

Then I thought: would I need to run an upgrade on HDFS for this to work?
I'm not sure I'm up for that.

Any idea of the time until .17.1?

On Mon, Jun 9, 2008 at 4:22 PM, Runping Qi <runping@yahoo-inc.com> wrote:

>
> This is a known problem for 0.17.0:
> https://issues.apache.org/jira/browse/HADOOP-3442
>
> It should be fixed in 0.17.1
>
> Runping
>
>
> > -----Original Message-----
> > From: Colin Freas [mailto:colinfreas@gmail.com]
> > Sent: Monday, June 09, 2008 12:56 PM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Stack Overflow When Running Job
> >
> > We were getting this exact same problem in a really simple MR job, on
> > input
> > produced from a known-working MR job.
> >
> > It seemed to happen intermittently, and we couldn't figure out what
> was up.
> > In the end we solved the problem by increasing the number of maps (80
> to
> > 200, this is a 6 node, 12 code cluster).  Apparently, QuickSort can
> have
> > problems with big chunks of pre-sorted data.  Too much recursion, I
> > believe.
> >
> > This might not be what's going on with you, maybe you're on a cluster
> of
> > some other scale, but this worked for us (and in a setup with Hadoop
> 0.17.)
> >
> > Good luck!
> >
> > -Colin
> >
> > On Mon, Jun 2, 2008 at 3:18 PM, Devaraj Das <ddas@yahoo-inc.com>
> wrote:
> >
> > > Hi, do you have a testcase that we can run to reproduce this?
> Thanks!
> > >
> > > > -----Original Message-----
> > > > From: jkupferman [mailto:jkupferman@umail.ucsb.edu]
> > > > Sent: Monday, June 02, 2008 9:22 AM
> > > > To: core-user@hadoop.apache.org
> > > > Subject: Stack Overflow When Running Job
> > > >
> > > >
> > > > Hi everyone,
> > > > I have a job running that keeps failing with Stack Overflows
> > > > and I really dont see how that is happening.
> > > > The job runs for about 20-30 minutes before one task errors,
> > > > then a few more error and it fails.
> > > > I am running hadoop-17 and ive tried lowering these settings
> > > > to no avail:
> > > > io.sort.factor        50
> > > > io.seqfile.sorter.recordlimit 500000
> > > >
> > > > java.io.IOException: Spill failed
> > > >       at
> > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
> > > > MapTask.java:594)
> > > >       at
> > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
> > > > MapTask.java:576)
> > > >       at
> java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
> > > >       at Group.write(Group.java:68)
> > > >       at GroupPair.write(GroupPair.java:67)
> > > >       at
> > > > org.apache.hadoop.io.serializer.WritableSerialization$Writable
> > > Serializer.serialize(WritableSerialization.java:90)
> > > >       at
> > > > org.apache.hadoop.io.serializer.WritableSerialization$Writable
> > > Serializer.serialize(WritableSerialization.java:77)
> > > >       at
> > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTa
> > > > sk.java:434)
> > > >       at MyMapper.map(MyMapper.java:27)
> > > >       at MyMapper.map(MyMapper.java:10)
> > > >       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> > > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> > > >       at
> > > >
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> > > > Caused by: java.lang.StackOverflowError
> > > >       at java.io.DataInputStream.readInt(DataInputStream.java:370)
> > > >       at Group.readFields(Group.java:62)
> > > >       at GroupPair.readFields(GroupPair.java:60)
> > > >       at
> > > > org.apache.hadoop.io.WritableComparator.compare(WritableCompar
> > > > ator.java:91)
> > > >       at
> > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTa
> > > > sk.java:494)
> > > >       at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29)
> > > >       at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
> > > >       at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
> > > > ....the above line repeated 200x
> > > >
> > > > I defined writeablecomparable called GroupPair which simply
> > > > holds to Group objects, each of which contains two integers.
> > > > I fail to see how QuickSort could recurse 200+ times since
> > > > that would require an insanely large amount of entries , far
> > > > more then the 500 million that had been output at that point.
> > > >
> > > > How is this even possible? And what can be done to fix this?
> > > > --
> > > > View this message in context:
> > > > http://www.nabble.com/Stack-Overflow-When-Running-Job-tp175935
> > > > 94p17593594.html
> > > > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > > >
> > > >
> > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message