hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-668) TestFileAppend3#TC7 sometimes hangs
Date Mon, 05 Oct 2009 22:51:31 GMT

    [ https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762439#action_12762439

Konstantin Shvachko commented on HDFS-668:

In case of recovery from pipeline close failure the client
# requests a new GS from NN via {{NameNode.updateBlockForPipeline()}};
# sends the new GS to the remaining DNs via {{DataStreamer.createBlockOutputStream()}};
# notifies NN of establishing the new pipeline, which updates the block's GS to the new one
via {{NameNode.updatePipeline()}}

During (2) DNs may send {{addBlock()}} to NN, which may cause race condition with notification
(3) from the client.
You are right one solution is to ignore GS in look up for the replica in {{addBlock()}}. The
best way to fix it that way is to implement HDFS-512.

Another solution would be to set the new GS to the block in (1). That is {{NameNode.updateBlockForPipeline()}}
will have to not only return the new GS, but also update the under-construction block with
this GS. I checked the code and do not see problems with this approach so far.

> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>                 Key: HDFS-668
>                 URL: https://issues.apache.org/jira/browse/HDFS-668
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
> TestFileAppend3 hangs because it fails on close the file. The following is the snippet
of logs that shows the cause of the problem:
>     [junit] 2009-10-01 07:00:00,719 WARN  hdfs.DFSClient (DFSClient.java:setupPipelineForAppendOrRecovery(3004))
- Error Recovery for block blk_-4098350497078465335_1007 in pipeline,
bad datanode
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(224))
- Receiving block blk_-4098350497078465335_1007 src: / dest: /
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode (FSDataset.java:recoverClose(1248))
- Recover failed close blk_-4098350497078465335_1007
>     [junit] 2009-10-01 07:00:00,723 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(369))
- Received block blk_-4098350497078465335_1008 src: / dest: /
of size 65536
>     [junit] 2009-10-01 07:00:00,724 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1006))
- BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_-4098350497078465335_1008
on size 65536 But it does not belong to any file.
>     [junit] 2009-10-01 07:00:00,724 INFO  namenode.FSNamesystem (FSNamesystem.java:updatePipeline(3946))
- updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008, newLength=65536,
newNodes=[], clientName=DFSClient_995688145)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message