hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
Date Fri, 01 Feb 2013 05:19:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568493#comment-13568493
] 

Konstantin Shvachko commented on HDFS-4452:
-------------------------------------------

I was going back and forth with this. Finally realised that the block offset parameter does
not help the problem.
Suppose we are creating the first block and there are two thread executing getAdditionalBlock()
simultaneously. That is, thread 1 is stuck in chooseTarget() and thread 2 is just starting
getAdditionalBlock(). Then thread 2 has no way to determine if it is a retry or not, because
thread 1 has not changed the file. And offset does not help anything, because it is the same
in both threads.

The solution is to repeat the full analysis in the second part within the second writeLock
section. When thread 2 reaches the second section, thread 1 is guaranteed to already create
the block. So we can simply return that block.
                
> getAdditionalBlock() can create multiple blocks if the client times out and retries.
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-4452
>                 URL: https://issues.apache.org/jira/browse/HDFS-4452
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Critical
>         Attachments: TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can timeout
and will reissue the same request again. The two requests will race with each other in {{FSNamesystem.getAdditionalBlock()}},
which can result in creating two new blocks on the NameNode while the client will know of
only one of them. This eventually results in {{NotReplicatedYetException}} because the extra
block is never reported by any DataNode, which stalls file creation and puts it in invalid
state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message