hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-668) TestFileAppend3#TC7 sometimes hangs
Date Thu, 15 Oct 2009 21:21:31 GMT

    [ https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766261#action_12766261
] 

Hairong Kuang commented on HDFS-668:
------------------------------------

What triggered the file close problem is that the datanode's "blockReceived" notification
reached NameNode earlier than the client's pipeline update. Each under construction blick
has two list of locations: the first is for keeping track of locations in a pipeline and the
second is a triplets for tracking of finalized replica locations. On receiving of blockReceived,
NameNode put the location into the block's triplets location. But when it later received the
pipeline update, in order to handle the newer generation stamp, it removed the old under construction
block entity  from the blocks map and constructed a new one and added the new one back into
the blocks map. However, the new block entity reset the second location list. That's why when
it was time to close the file, NN checked the second list and complained there is no replica,
so it refuse to close the file. 

> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
>                 Key: HDFS-668
>                 URL: https://issues.apache.org/jira/browse/HDFS-668
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hdfs-668.patch
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is the snippet
of logs that shows the cause of the problem:
>     [junit] 2009-10-01 07:00:00,719 WARN  hdfs.DFSClient (DFSClient.java:setupPipelineForAppendOrRecovery(3004))
- Error Recovery for block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375, 127.0.0.1:36982:
bad datanode 127.0.0.1:36982
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(224))
- Receiving block blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode (FSDataset.java:recoverClose(1248))
- Recover failed close blk_-4098350497078465335_1007
>     [junit] 2009-10-01 07:00:00,723 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(369))
- Received block blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
of size 65536
>     [junit] 2009-10-01 07:00:00,724 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1006))
- BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_-4098350497078465335_1008
on 127.0.0.1:58375 size 65536 But it does not belong to any file.
>     [junit] 2009-10-01 07:00:00,724 INFO  namenode.FSNamesystem (FSNamesystem.java:updatePipeline(3946))
- updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008, newLength=65536,
newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message