hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1606) Provide a stronger data guarantee in the write pipeline
Date Wed, 09 Feb 2011 06:59:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992347#comment-12992347
] 

Tsz Wo (Nicholas), SZE commented on HDFS-1606:
----------------------------------------------

h5. When to add a datanode?
Since adding a datanode to an existing pipeline is an expensive operation (see [the previoius
comment|https://issues.apache.org/jira/browse/HDFS-1606?focusedCommentId=12991839&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12991839]),
it should not be performed for every pipeline failure.  Suppose the number of replications
of the file is greater than or equal to 3.  When a pipeline fails, the operation will be invoked
if
* the number of datanodes in the pipeline drops from 2 to 1; or
* the block is reopened for append; or
* it is specified by the user.

Note that when the number of replications is specified to less than 3, the operation should
not be invoked by default because performance is preferred over data guarantee.

> Provide a stronger data guarantee in the write pipeline
> -------------------------------------------------------
>
>                 Key: HDFS-1606
>                 URL: https://issues.apache.org/jira/browse/HDFS-1606
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, hdfs client
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> In the current design, if there is a datanode/network failure in the write pipeline,
DFSClient will try to remove the failed datanode from the pipeline and then continue writing
with the remaining datanodes.  As a result, the number of datanodes in the pipeline is decreased.
 Unfortunately, it is possible that DFSClient may incorrectly remove a healthy datanode but
leave the failed datanode in the pipeline because failure detection may be inaccurate under
erroneous conditions.
> We propose to have a new mechanism for adding new datanodes to the pipeline in order
to provide a stronger data guarantee.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message