hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zephyr Guo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13243) Get CorruptBlock because of calling close and sync in same time
Date Wed, 28 Mar 2018 11:26:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417198#comment-16417198
] 

Zephyr Guo commented on HDFS-13243:
-----------------------------------

Rebase patch-v3, attach v4.

> Get CorruptBlock because of calling close and sync in same time
> ---------------------------------------------------------------
>
>                 Key: HDFS-13243
>                 URL: https://issues.apache.org/jira/browse/HDFS-13243
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.2, 3.2.0
>            Reporter: Zephyr Guo
>            Assignee: Zephyr Guo
>            Priority: Critical
>         Attachments: HDFS-13243-v1.patch, HDFS-13243-v2.patch, HDFS-13243-v3.patch, HDFS-13243-v4.patch
>
>
> HDFS File might get broken because of corrupt block(s) that could be produced by calling
close and sync in the same time.
> When calling close was not successful, UCBlock status would change to COMMITTED, and
if a sync request gets popped from queue and processed, sync operation would change the last
block length.
> After that, DataNode would report all received block to NameNode, and will check Block
length of all COMMITTED Blocks. But the block length was already different between recorded
in NameNode memory and reported by DataNode, and consequently, the last block is marked as
corruptted because of inconsistent length.
>  
> {panel:title=Log in my hdfs}
> 2018-03-05 04:05:39,261 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1085498930_11758129\{UCState=UNDER_CONSTRUCTION,
truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
for /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> 2018-03-05 04:05:39,760 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
for DFSClient_NONMAPREDUCE_1077513762_1
> 2018-03-05 04:05:39,761 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK*
blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in file /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.0.0.220:50010 is added to blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null,
primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
size 2054413
> 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap:
blk_1085498930 added as corrupt on 10.0.0.219:50010 by hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com/10.0.0.219
because block is COMMITTED and reported length 2054413 does not match length in block map
141232
> 2018-03-05 04:05:39,762 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap:
blk_1085498930 added as corrupt on 10.0.0.218:50010 by hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218
because block is COMMITTED and reported length 2054413 does not match length in block map
141232
> 2018-03-05 04:05:40,162 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK*
blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in file /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message