hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-11852) Under-repicated block never completes because of failure in commitBlockSynchronization()
Date Thu, 25 May 2017 15:23:04 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kihwal Lee resolved HDFS-11852.
-------------------------------
    Resolution: Duplicate

> Under-repicated block never completes because of failure in commitBlockSynchronization()
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-11852
>                 URL: https://issues.apache.org/jira/browse/HDFS-11852
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.3
>            Reporter: Ravi Prakash
>
> Credit goes to Charles Wimmer and Karthik Kumar for pointing me to this issue.
> We noticed a block is holding up decommissioning because recovery failed. (The stack
trace below is from the time when the cluster was 2.7.2) . DN2 and DN3 are no longer part
of the cluster. DN1 is the node held up for decomissioning. I checked that the block and meta
file indeed are in the finalized directory.
> {code}2016-09-19 09:02:25,837 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: recoverBlocks
FAILED: RecoveringBlock{BP-<someid>:blk_1094097355_20357090; getBlockSize()=0; corrupt=false;
offset=-1; locs=[DatanodeInfoWithStorage[<DN1>:50010,null,null], DatanodeInfoWithStorage[<DN2>:50010,null,null],
DatanodeInfoWithStorage[<DN3>:50010,null,null]]}
> org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Failed to finalize
INodeFile <filename> since blocks[0] is non-complete, where blocks=[blk_1094097355_20552508{UCState=COMMITTED,
truncateBlock=null, primaryNodeIndex=0, replicas=[ReplicaUC[[DISK]DS-03bed13e-5cdd-4207-91b6-abd83f9eb7d3:NORMAL:<DN1>:50010|RBW]]}].
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:172)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:222)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.toCompleteFile(INodeFile.java:209)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.finalizeINodeFileUnderConstruction(FSNamesystem.java:4218)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4457)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4419)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:837)
>         at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:291)
>         at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28768)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>         at com.sun.proxy.$Proxy16.commitBlockSynchronization(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolClientSideTranslatorPB.java:312)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:2780)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2642)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:243)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2519)
>         at java.lang.Thread.run(Thread.java:744){code}
> On the namenode side
> {code}
> 2016-09-19 09:02:25,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(oldBlock=BP-<someid>:blk_1094097355_20357090,
newgenerationstamp=20552508, newlength=18642324, newtargets=[<DN1>:50010], closeFile=true,
deleteBlock=false){code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message