hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9275) Fix TestRecoverStripedFile
Date Thu, 22 Oct 2015 05:34:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968588#comment-14968588
] 

Walter Su commented on HDFS-9275:
---------------------------------

||DN0||DN1||DN2||DN3||DN4||DN5||DN6||DN7||DN8||DN9||DN10||DN11
| |*|*|*|*|*|*|*|*|*| | |   <-- BlockGroup_0
| | |*|*|*|*|*|*|*|*|*| |   <-- BlockGroup_1

The test case only tests last block group. Suppose DN8~10 are shutdown. ReplicationMonitor
will schedule a recovery. Firstly need to call BlockPlacementPolicy to choose targets. DN2~DN10
are excluded because they already have internal blocks on them. To recover 3 blocks, it must
choose DN0, DN1, DN11.

But DN1 has a block belonging to BlockGroup_0. The last time DN1 sent a heartbeat, it reported
its {{xceiverCount}} is 3. {{xceiverCount}} is equals to the active thread in DataNode.threadGroup,
as show below.

{noformat}
DatanodeRegistration(127.0.0.1:47705, datanodeUuid=43e5be32-2066-4057-9b25-8544d2d542bc, infoPort=43445,
infoSecurePort=0, ipcPort=34036, storageInfo=lv=-56;cid=testClusterID;nsid=23260287;c=1445489667626)
java.lang.ThreadGroup[name=dataXceiverServer,maxpri=10]
    Thread[org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@6aa03871,5,dataXceiverServer]
    Thread[DataXceiver for client DFSClient_NONMAPREDUCE_-1867405584_1 at /127.0.0.1:56717
[Receiving block BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001],5,dataXceiverServer]
    Thread[PacketResponder: BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001,
type=LAST_IN_PIPELINE, downstreams=0:[],5,dataXceiverServer]
{noformat}
{{xceiverCount}} equals to 3 is lager than average number, so DN1 is excluded by {{chooseRandom()}}.
Then BlockGroup_1 can only recover 2 blocks. As discussed [here|https://issues.apache.org/jira/browse/HDFS-8220?focusedCommentId=14518931&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14518931]
, now temporarily PlacementPolicy doesn't support return two identical storages, aka, no 2
replicas(internal blocks) in the same storage. 

We could simply add more DNs to fix the test. Or we can set {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY}}
to false in the test case.

The 02 patch includes some clean up. Kindly review. Thanks.

> Fix TestRecoverStripedFile
> --------------------------
>
>                 Key: HDFS-9275
>                 URL: https://issues.apache.org/jira/browse/HDFS-9275
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message