hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das" <d...@yahoo-inc.com>
Subject RE: some reducers stock in copying stage
Date Thu, 01 Mar 2007 06:10:49 GMT
Looks like it is something to do with the new checksum patch (hadoop-928). I
may be wrong but I think it is worth taking a look at that patch.

> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
> Sent: Thursday, March 01, 2007 9:26 AM
> To: 'hadoop-dev@lucene.apache.org'
> Subject: RE: some reducers stock in copying stage
> 
> Weird! This looks like some other problem which happened while merging the
> outputs at the Reduce task. The copying stage went through fine. This
> requires some more analysis.
> 
> > -----Original Message-----
> > From: Mike Smith [mailto:mike.smith.dev@gmail.com]
> > Sent: Thursday, March 01, 2007 3:44 AM
> > To: hadoop-dev@lucene.apache.org
> > Subject: Re: some reducers stock in copying stage
> >
> > Devaraj,
> >
> > After applying patch 1043 the copying problem is solved. But, I am
> > getting new exceptions, but, the tasks will be finished after
> reassigning
> > to
> > another tasktracker. So, the job gets done eventually. But, I never had
> > this
> > exception before applying this patch (or could it be because of chaning
> > back-off time to 5 sec?):
> >
> > java.lang.NullPointerException
> > at
> >
> org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java
> > :74)
> > at
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121)
> > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(
> > ChecksumFileSystem.java:217)
> > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(
> > ChecksumFileSystem.java:163)
> > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(
> > FSDataInputStream.java:41)
> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> > at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> > at java.io.DataInputStream.readFully(DataInputStream.java:178)
> > at java.io.DataInputStream.readFully(DataInputStream.java:152)
> > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(
> > SequenceFile.java:427)
> > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(
> > SequenceFile.java:414)
> > at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java
> > :1665)
> > at
> > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(
> > SequenceFile.java:2579)
> > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(
> > SequenceFile.java:2351)
> > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java
> > :2226)
> > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(
> > SequenceFile.java:2442)
> > at
> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270)
> > at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444)
> >
> > java.lang.NullPointerException
> > at
> >
> org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java
> > :74)
> > at
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121)
> > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(
> > ChecksumFileSystem.java:217)
> > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(
> > ChecksumFileSystem.java:163)
> > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(
> > FSDataInputStream.java:41)
> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> > at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> > at java.io.DataInputStream.readFully(DataInputStream.java:178)
> > at java.io.DataInputStream.readFully(DataInputStream.java:152)
> > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(
> > SequenceFile.java:427)
> > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(
> > SequenceFile.java:414)
> > at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java
> > :1665)
> > at
> > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(
> > SequenceFile.java:2579)
> > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(
> > SequenceFile.java:2351)
> > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java
> > :2226)
> > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(
> > SequenceFile.java:2442)
> > at
> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270)
> > at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444)
> >
> >
> >
> > On 2/28/07, Mike Smith <mike.smith.dev@gmail.com> wrote:
> > >
> > > Thanks Devaraj, patch 1042 seems to be already committed. Also, the
> > system
> > > never recovered even after 1 min, 300 sec, it stocked there for hours.
> I
> > > will try patch 1043 and also decrease the back-off time to see if
> those
> > help
> > >
> > >



Mime
View raw message