hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-668) TestFileAppend3#TC7 sometimes hangs
Date Thu, 01 Oct 2009 18:46:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761298#action_12761298
] 

Hairong Kuang commented on HDFS-668:
------------------------------------

In above case, a pipeline close recovery was performed to a block that was opened to append.
The datanode bumped its replica's generation stamp (GS) and finalized the replica. It then
notifies NN of the new replica. However, the datanode's notification reached NN before the
client notifies NN of the replica's new GS and length. NN mistakenly treated this good replica
as a bad one, a replica that does not belong to any file. Thus a file was not able to be close.

The solution is not to match  GS when searching for stored block info when adding a finalized
replica.

> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
>                 Key: HDFS-668
>                 URL: https://issues.apache.org/jira/browse/HDFS-668
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is the snippet
of logs that shows the cause of the problem:
>     [junit] 2009-10-01 07:00:00,719 WARN  hdfs.DFSClient (DFSClient.java:setupPipelineForAppendOrRecovery(3004))
- Error Recovery for block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375, 127.0.0.1:36982:
bad datanode 127.0.0.1:36982
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(224))
- Receiving block blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode (FSDataset.java:recoverClose(1248))
- Recover failed close blk_-4098350497078465335_1007
>     [junit] 2009-10-01 07:00:00,723 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(369))
- Received block blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
of size 65536
>     [junit] 2009-10-01 07:00:00,724 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1006))
- BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_-4098350497078465335_1008
on 127.0.0.1:58375 size 65536 But it does not belong to any file.
>     [junit] 2009-10-01 07:00:00,724 INFO  namenode.FSNamesystem (FSNamesystem.java:updatePipeline(3946))
- updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008, newLength=65536,
newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message