hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
Date Fri, 01 Feb 2013 05:33:14 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Konstantin Shvachko updated HDFS-4452:

    Attachment: getAdditionalBlock.patch

Here is the patch.
I wrapped the analysis in a separate function, which is called in both parts of getAdditionalBlock()
- before and after chooseTarget().
- The patch returns previously allocated block if this is a retry rather than removing and
then recreating it as in current code.
- The patch moves all updates to the file into the second locking section, which makes it
possible to hold only readLock for the first section.

If people could review this quickly, I think we can commit it to make into the upcoming release.
This is a long standing  rather annoying bug.
And it does not introduce incompatible changes.
> getAdditionalBlock() can create multiple blocks if the client times out and retries.
> ------------------------------------------------------------------------------------
>                 Key: HDFS-4452
>                 URL: https://issues.apache.org/jira/browse/HDFS-4452
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Critical
>         Attachments: getAdditionalBlock.patch, TestAddBlockRetry.java
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can timeout
and will reissue the same request again. The two requests will race with each other in {{FSNamesystem.getAdditionalBlock()}},
which can result in creating two new blocks on the NameNode while the client will know of
only one of them. This eventually results in {{NotReplicatedYetException}} because the extra
block is never reported by any DataNode, which stalls file creation and puts it in invalid
state with an empty block in the middle.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message