hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
Date Fri, 20 Jul 2018 21:43:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei-Chiu Chuang updated HDFS-13758:
-----------------------------------
    Attachment: HDFS-10240 scenarios.jpg

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is
not under construction
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13758
>                 URL: https://issues.apache.org/jira/browse/HDFS-13758
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HDFS-10240 scenarios.jpg
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a BlockRecoveryCommand exists
for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
>     BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
>     assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 data corruption,
if a recoverLease() is made immediately followed by a close(), before DataNodes have the chance
to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server (Server.java:logException(2724))
- IPC Server handler 9 on 57890, call Call#41 Retry#0 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat
from 127.0.0.1:57903
> java.lang.AssertionError
> 	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
> 	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
> 	at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more noticeable NPE.
Future HDFS developers might fix this NPE, causing regression. An NPE is typically not captured
and handled, so there's a chance to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix should reject
close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message