hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3368) Missing blocks due to bad DataNodes comming up and down.
Date Fri, 25 May 2012 00:49:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282880#comment-13282880

Konstantin Shvachko commented on HDFS-3368:

No, all failures are unrelated to the patch.
I looked through the Jenkins logs.

# org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum 
This one failes because previous test sets xceiver count in config to 2 and never resets it
back. So creation of a large file in testGetFileChecksum eventually fails, because DNs refuse
to add more xceiver threads.
java.io.IOException: Xceiver count 3 exceeds the limit of concurrent xcievers: 2
	at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:143)
	at java.lang.Thread.run(Thread.java:662)
# org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1 
Failes because DFSTestUtil.waitCorruptReplicas() is timing- / delay- sensitive.
It reads some file 50 times and checks if the corruption is detected after each read.
That time was enough for the DN to restart, but not enough for NN to detect the corruption.
Looking for "NameSystem.addToCorruptReplicasMap:" and it is not in the logs.
By the way testBlockCorruptionRecoveryPolicy2 which corrupts 2 replicas onstead of one worked
# org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.testCorruptBlockRereplicatedAcrossRacks
failes for the same reason. I see fifty "Waiting for 1 corrupt replicas", which means 50 read
have been done, but no "addToCorruptReplicasMap" indicating that corruption was not detected.

I can file jiras for that.

Resubmitted the build in case I missed something.
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0-alpha, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: blockDeletePolicy-0.22.patch, blockDeletePolicy-0.22.patch, blockDeletePolicy-trunk.patch,
blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
> All replicas of a block can be removed if bad DataNodes come up and down during cluster
restart resulting in data loss.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message