hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Sirianni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5434) Write resiliency for replica count 1
Date Fri, 20 Dec 2013 01:40:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853579#comment-13853579
] 

Eric Sirianni commented on HDFS-5434:
-------------------------------------

Good question!  To be frank, we haven't fully implemented append yet.  Our current design
approach relies on shared storage (see HDFS-5318) in our {{FsDatasetSpi}} plugin in order
to provide a multi-node pipeline in the append case for {{repcount=1}}.  With shared storage,
the single physical replica is reported via _multiple_ DataNodes to the NameNode.  For append,
the NameNode should include _all_ those DataNodes in the append pipeline (see caveat below).
 Note that this requires some _out-of-band_ coordination in our {{FsDatasetSpi}} plugin in
order to actually persist the appended data to the shared replica in a consistent manner.

So, to summarize, we would not rely on the {{BlockPlacementPolicy}} extension to enforce a
multiinode append pipeline with {{repcount=1}}.  Instead, we would rely on shared storage
and multiple replica reporting to accomplish this.  I realize that this asymmetry somewhat
invalidates my earlier assertion that a general solution for divorcing the repcount from the
pipeline length is achievable.

Let me know if this makes sense or if any clarifications are needed - I may be assuming too
much context here.

h5. Caveat:
Actually only those on Storages reported as {{READ_WRITE}} should be included in the append
pipeline.  This may be a gap in the current NameNode append code - I'll follow up on this.
 This also illustrates why your suggestion on HDFS-5318 of reporting only a _single_ {{READ_WRITE}}
node for a given shared replica may be problematic - we wouldn't get a multi-node pipeline
for append in this case.


> Write resiliency for replica count 1
> ------------------------------------
>
>                 Key: HDFS-5434
>                 URL: https://issues.apache.org/jira/browse/HDFS-5434
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.2.0
>            Reporter: Buddy
>            Priority: Minor
>         Attachments: BlockPlacementPolicyMinPipelineSize.java, BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java,
HDFS_5434.patch
>
>
> If a file has a replica count of one, the HDFS client is exposed to write failures if
the data node fails during a write. With a pipeline of size of one, no recovery is possible
if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the replication
count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data nodes instead
of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure
that the write succeeds to the surviving data node.
> The existing code in the name node will prune the extra replica when it receives the
block received reports for the finalized block from both data nodes. This results in the intended
replica count of one for the block.
> This behavior should be controlled by a configuration option such as {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by ensuring
that the pipeline size passed to {{BlockPlacementPolicy.chooseTarget()}} in the replication
parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message