hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hariharan <hariharan...@gmail.com>
Subject Application failed with "All datanodes are bad"
Date Fri, 20 Jan 2017 20:08:33 GMT
I had an application (hive command) fail with the following error:

2017-01-20 16:40:24,279 <some part of stack trace stripped>
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: All datanodes 10.152.76.225:50010 are bad.
Aborting...
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:860)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.UnionOperator.process(UnionOperator.java:137)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)

Investigating into the node affected (10.152.76.225), there was an
error just before the one logged above:
2017-01-20 16:40:23,743 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for
BP-1380197209-10.152.76.187-1484669381166:blk_1074615677_876595
java.net.SocketTimeoutException: 60000 millis timeout while waiting
for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.152.76.225:50010
remote=/10.152.76.226:33610]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:734)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
        at java.lang.Thread.run(Thread.java:745)

And following further into the chain (10.152.76.226), there was an
error about a minute before the stack above:
2017-01-20 16:39:00,223 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
ip-10-152-76-226.us-west-2.compute.internal:50010:DataXceiver error
processing WRITE_BLOCK operation  src: /10.152.76.226:46168 dst:
/10.152.76.226:50010
java.net.SocketTimeoutException: 60000 millis timeout while waiting
for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.152.76.226:50010
remote=/10.152.76.226:46168]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:734)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
        at java.lang.Thread.run(Thread.java:745)

Note that in this stack the src and dest are both same, so even a n/w
error seems improbable.
I checked the obvious possibilities - disk space (~1.8TB available),
ulimit (set to 32768) on both nodes and they seemed ok. I'm at a loss
as to why this error happens. It happens quite frequently as well. Any
ideas on why this would happen, or what I should check next?

Environment:
Hadoop 2.6
Centos Linux
Running on AWS

Thanks,
Hariharan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message