hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
Date Tue, 29 Jan 2013 20:09:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565738#comment-13565738

Konstantin Shvachko commented on HDFS-4452:

A bit more details. {{FSNamesystem.getAdditionalBlock()}} consists of two parts both surrounded
by {{writeLock}}. First part validates different conditions on the file. The second actually
adds a new block. The two parts are separated by {{chooseTarget()}}, which is performed outside
the the {{writeLock}}.
In the error scenario the are two threads trying to perform the same operation {{getAdditionalBlock()}}
with the same input parameters.
- First thread goes through the first {{writeLock}} section checking the file state and releases
control somewhere in {{chooseTarget()}}.
- Then the second thread starts working on the same first {{writeLock}} section and will get
the same results as the first thread, because the file state hasn't changed.
- Then both threads execute the second {{writeLock}} section in some order, which doesn't
matter, and create two different blocks with different ids and targets.
- The client will receive response only from one of the competing threads and will have no
idea about the other block. The client proceeds with sending data to DataNodes.
- When the client tries to create yet another block it will receive {{NotReplicatedYetException}},
because the penultimate block in the file has 0 replicas and will never have more.
> getAdditionalBlock() can create multiple blocks if the client times out and retries.
> ------------------------------------------------------------------------------------
>                 Key: HDFS-4452
>                 URL: https://issues.apache.org/jira/browse/HDFS-4452
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Critical
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can timeout
and will reissue the same request again. The two requests will race with each other in {{FSNamesystem.getAdditionalBlock()}},
which can result in creating two new blocks on the NameNode while the client will know of
only one of them. This eventually results in {{NotReplicatedYetException}} because the extra
block is never reported by any DataNode, which stalls file creation and puts it in invalid
state with an empty block in the middle.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message