hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10624) VolumeScanner to report why a block is found bad
Date Thu, 14 Jul 2016 05:13:20 GMT
Yongjun Zhang created HDFS-10624:
------------------------------------

             Summary: VolumeScanner to report why a block is found bad
                 Key: HDFS-10624
                 URL: https://issues.apache.org/jira/browse/HDFS-10624
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode, hdfs
            Reporter: Yongjun Zhang


Seeing the following on DN log. 

{code}
2016-04-07 20:27:45,416 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock
BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96465013 received exception java.io.EOFException:
Premature EOF: no length prefix available
2016-04-07 20:27:45,416 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: rn2-lampp-lapp1115.rno.apple.com:1110:DataXceiver
error processing WRITE_BLOCK operation  src: /10.204.64.137:45112 dst: /10.204.64.151:1110
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2241)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:738)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
        at java.lang.Thread.run(Thread.java:745)
2016-04-07 20:27:46,116 WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting
bad BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96458336 on /ngs8/app/lampp/dfs/dn
2016-04-07 20:27:46,117 ERROR org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/ngs8/app/lampp/dfs/dn,
DS-a14baf2b-a1ef-4282-8d88-3203438e708e) exiting because of exception
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
        at org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
        at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
        at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
        at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
2016-04-07 20:27:46,118 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/ngs8/app/lampp/dfs/dn,
DS-a14baf2b-a1ef-4282-8d88-3203438e708e) exiting.
2016-04-07 20:27:46,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.204.64.151,
datanodeUuid=6064994a-6769-4192-9377-83f78bd3d7a6, infoPort=0, infoSecurePort=1175, ipcPort=1120,
storageInfo=lv=-56;cid=cluster6;nsid=1112595121;c=0):Failed to transfer BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96465013
to 10.204.64.10:1110 got
java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
        at org.apache.hadoop.security.SaslOutputStream.write(SaslOutputStream.java:190)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:585)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:758)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:705)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2154)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.transferReplicaForPipelineRecovery(DataNode.java:2884)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.transferBlock(DataXceiver.java:862)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opTransferBlock(Receiver.java:200)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:118)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
        ... 25 more
{code}

Particularly

{code}
2016-04-07 20:27:46,116 WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting
bad BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96458336 on /ngs8/app/lampp/dfs/dn
{code}
means VolumeScanner/BlockScanner found the replica corrupt, or have other issue. It would
be very helpful to report the reason here. If it's corrupt, where is the first corrupt data
(or chunk) in the block, and the total replica length. Creating this jira to request this
enhancement.
 
BTW, the NPE in the above log was resolved as HDFS-10512 (thanks Wei-Chiu).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Mime
View raw message