hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11852) Under-repicated block never completes because of failure in commitBlockSynchronization()
Date Thu, 25 May 2017 17:05:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025029#comment-16025029
] 

Ravi Prakash commented on HDFS-11852:
-------------------------------------

Thank you for the pointer Kihwal!

In our case, the only replica available was on the decomissioning node. I'm guessing one of
the other datanodes may have been decommissioned successfully and a second failed perhaps.
In that case, HDFS-11499 will likely not recover the under-replicated block. However it would
reduce the likelihood of reaching that state, so I agree with closing this JIRA as a duplicate
of HDFS-11499.

> Under-repicated block never completes because of failure in commitBlockSynchronization()
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-11852
>                 URL: https://issues.apache.org/jira/browse/HDFS-11852
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.3
>            Reporter: Ravi Prakash
>
> Credit goes to Charles Wimmer and Karthik Kumar for pointing me to this issue.
> We noticed a block is holding up decommissioning because recovery failed. (The stack
trace below is from the time when the cluster was 2.7.2) . DN2 and DN3 are no longer part
of the cluster. DN1 is the node held up for decomissioning. I checked that the block and meta
file indeed are in the finalized directory.
> {code}2016-09-19 09:02:25,837 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: recoverBlocks
FAILED: RecoveringBlock{BP-<someid>:blk_1094097355_20357090; getBlockSize()=0; corrupt=false;
offset=-1; locs=[DatanodeInfoWithStorage[<DN1>:50010,null,null], DatanodeInfoWithStorage[<DN2>:50010,null,null],
DatanodeInfoWithStorage[<DN3>:50010,null,null]]}
> org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Failed to finalize
INodeFile <filename> since blocks[0] is non-complete, where blocks=[blk_1094097355_20552508{UCState=COMMITTED,
truncateBlock=null, primaryNodeIndex=0, replicas=[ReplicaUC[[DISK]DS-03bed13e-5cdd-4207-91b6-abd83f9eb7d3:NORMAL:<DN1>:50010|RBW]]}].
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:172)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:222)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.toCompleteFile(INodeFile.java:209)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.finalizeINodeFileUnderConstruction(FSNamesystem.java:4218)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4457)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4419)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:837)
>         at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:291)
>         at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28768)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>         at com.sun.proxy.$Proxy16.commitBlockSynchronization(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolClientSideTranslatorPB.java:312)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:2780)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2642)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:243)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2519)
>         at java.lang.Thread.run(Thread.java:744){code}
> On the namenode side
> {code}
> 2016-09-19 09:02:25,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(oldBlock=BP-<someid>:blk_1094097355_20357090,
newgenerationstamp=20552508, newlength=18642324, newtargets=[<DN1>:50010], closeFile=true,
deleteBlock=false){code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message