hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9289) check genStamp when complete file
Date Thu, 22 Oct 2015 21:05:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969890#comment-14969890
] 

Elliott Clark commented on HDFS-9289:
-------------------------------------

We just had this something very similar happen on a prod cluster. Then the datanode holding
the only complete block was shut off for repair.

{code}
15/10/22 06:29:32 INFO hdfs.StateChange: BLOCK* allocateBlock: /TESTCLUSTER-HBASE/WALs/hbase4544.test.com,16020,1444266312515/hbase4544.test.com%2C16020%2C1444266312515.default.1445520572440.
BP-1735829752-10.210.49.21-1437433901380 blk_1190230043_116735085{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-52d9a122-a46a-4129-ab3d-d9041de109f8:NORMAL:10.210.31.48:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-c734b72e-27de-4dd4-a46c-7ae59f6ef792:NORMAL:10.210.31.38:50010|RBW]]}
15/10/22 06:32:48 INFO namenode.FSNamesystem: updatePipeline(block=BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085,
newGenerationStamp=116737586, newLength=201675125, newNodes=[10.210.81.33:50010, 10.210.81.45:50010,
10.210.64.29:50010], clientName=DFSClient_NONMAPREDUCE_1976436475_1)
15/10/22 06:32:48 INFO namenode.FSNamesystem: updatePipeline(BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085)
successfully to BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116737586
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.210.64.29:50010
is added to blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-d5f7fff9-005d-4804-a223-b6e6624d3af2:NORMAL:10.210.81.45:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED]]}
size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.210.81.45:50010
is added to blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED]]}
size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.210.81.33:50010
is added to blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED],
ReplicaUnderConstruction[[DISK]DS-4d937567-7184-40b7-a822-c7e3b5d588d4:NORMAL:10.210.81.33:50010|FINALIZED]]}
size 201681322
15/10/22 09:37:36 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1190230043
added as corrupt on 10.210.31.38:50010 by hbase4678.test.com/10.210.31.38 because reported
RBW replica with genstamp 116735085 does not match COMPLETE block's genstamp in block map
116737586
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* invalidateBlock: blk_1190230043_116735085(stored=blk_1190230043_116737586)
on 10.210.31.38:50010
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1190230043_116735085
to 10.210.31.38:50010
15/10/22 09:37:39 INFO BlockStateChange: BLOCK* BlockManager: ask 10.210.31.38:50010 to delete
[blk_1190230043_116735085]
15/10/22 12:45:03 INFO BlockStateChange: BLOCK* ask 10.210.64.29:50010 to replicate blk_1190230043_116737586
to datanode(s) 10.210.64.56:50010
15/10/22 12:45:07 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1190230043
added as corrupt on 10.210.64.29:50010 by hbase4496.test.com/10.210.64.56 because client machine
reported it
15/10/22 12:50:49 INFO BlockStateChange: BLOCK* ask 10.210.81.45:50010 to replicate blk_1190230043_116737586
to datanode(s) 10.210.49.49:50010
15/10/22 12:50:55 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1190230043
added as corrupt on 10.210.81.45:50010 by hbase4478.test.com/10.210.49.49 because client machine
reported it
15/10/22 12:56:01 WARN blockmanagement.BlockManager: PendingReplicationMonitor timed out blk_1190230043_116737586
{code}

The patch will help but the issue will still be there. Is there some way to keep the genstamps
from getting out of sync?

> check genStamp when complete file
> ---------------------------------
>
>                 Key: HDFS-9289
>                 URL: https://issues.apache.org/jira/browse/HDFS-9289
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>         Attachments: HDFS-9289.1.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a pipelineUpdate,
but the file complete with the old block genStamp. This caused the replicas of two datanodes
in updated pipeline to be viewed as corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message