hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Map Rask has IO exception
Date Mon, 07 Nov 2011 20:29:02 GMT
I really don't know if there is any more I can do over email.  You might want to look at the
metrics to see if anything out of the ordinary is happening on these nodes just before or
just after the error happens.  Is there anything else in the logs that looks a little bit
odd compared to the other jobs.  I know 10 hours of logs is a lot to go through but I really
cannot think of anything else that could be causing this.

--Bobby Evans

On 11/7/11 2:03 PM, "Steve Lewis" <lordjoe2000@gmail.com> wrote:

I suspect that HDFS and/or its local disk may be full or sick

The problem occurs after a job has been running at least 10 hours -
I am too new at this to know too much about where to look to see how bad hdfs is and could
use some pointers.

There are points in the job where the reducer writes to hdfs but I believe these are later
and  one reduce tasks owns each file written. There is a demon which clears out logs but I
saw the error many times and at times the demon did not run.

Any suggestions would be useful

This is the same machine but not when I am seeing these errors
bin/hadoop dfsadmin -report
Configured Capacity: 23115117404160 (21.02 TB)
Present Capacity: 21918568804352 (19.93 TB)
DFS Remaining: 20307713318912 (18.47 TB)
DFS Used: 1610855485440 (1.47 TB)
DFS Used%: 7.35%
Under replicated blocks: 6
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 6 (10 total, 4 dead)

Name: 10.2.4.30:50010 <http://10.2.4.30:50010>
Decommission Status : Normal
Configured Capacity: 3852519567360 (3.5 TB)
DFS Used: 266415693824 (248.12 GB)
Non DFS Used: 199377129472 (185.68 GB)
DFS Remaining: 3386726744064(3.08 TB)
DFS Used%: 6.92%
DFS Remaining%: 87.91%
Last contact: Mon Nov 07 11:57:35 PST 2011



On Mon, Nov 7, 2011 at 9:20 AM, Robert Evans <evans@yahoo-inc.com> wrote:
Did you mean 0.20.2?
If so then Wow, that is a bit of a stumper.  Line 200 of BZip2Codec.java is the following

196:    public void write(int b) throws IOException {
197:      if (needsReset) {
198:        internalReset();
199:      }
200:      this.output.write(b);
201:   }

So it must be that the output stream itself(this.output) is null (or this is null which would
mean that java itself has something very wrong with it).  So it looks like for some reason
the output stream for the spill file is coming back as null, but if I look at the code for
IFile, where the output stream is created
     ...
      this.checksumOut = new IFileOutputStream(out);
     ...
      if (codec != null) {
        this.compressor = CodecPool.getCompressor(codec);
        this.compressor.reset();
        this.compressedOut = codec.createOutputStream(checksumOut, compressor);
    ...

I don't see any way that checksumOut could be null.  There may have been some sort of an optimization
with in IFileOutputStream, but I really don't see how.

You might want to look at how full the disks are on the nodes that it is failing on.  You
might also want to check to see if any records were output by these mappers at all, because
this is failing on close, and it would be very interesting to see if anything else was output
to the IFile before this?

--Bobby Evans


On 11/7/11 10:36 AM, "Steve Lewis" <lordjoe2000@gmail.com <http://lordjoe2000@gmail.com>
> wrote:

0.202 and using that API  -

On Mon, Nov 7, 2011 at 8:27 AM, Robert Evans <evans@yahoo-inc.com <http://evans@yahoo-inc.com>
> wrote:
What version of Hadoop are you using?



On 11/5/11 11:09 AM, "Steve Lewis" <lordjoe2000@gmail.com <http://lordjoe2000@gmail.com>
 <http://lordjoe2000@gmail.com> > wrote:

My job is dying during a  map task write. This happened in enough task to kill the job although
most tasks succeeded -

Any ideas as to where to start diagnosing the problem



Caused by: java.lang.NullPointerException
 at org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionOutputStream.write(BZip2Codec.java:200)
 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
 at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
 at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:263)
 at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:243)
 at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:126)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1242)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135)






Mime
View raw message