hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5318) Pluggable interface for replica counting
Date Wed, 08 Jan 2014 02:48:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864999#comment-13864999

Arpit Agarwal commented on HDFS-5318:

Hi Eric, I took a look at pipeline recovery today. It looks like the following cases are of
# Block is finalized, r/w replica is lost, r/o replica is available. In this case the existing
NN replication mechanisms will cause an extra replica to be created (q. what happens if a
client attempts to append before the replication happens? Client probably needs to be fixed
to handle this).
# Block is RBW. r/w replica is lost, r/o replica is available. In the usual case DFSClientOutputStream
will recover the write pipeline by selecting another DN, transferring block contents to the
new DN and inserting it in the write pipeline. However pipeline recovery will not work when
the single replica in the pipeline is lost, as you guys already mentioned on HDFS-5318. I
think you can use either the client side setting or block placement policy option in that
case that is being discussed there.

Updating the suggested approach:
# Each DataNode presents a different StorageID for the same physical storage.
# Read-only replicas are not counted towards satisfying the replication factor. This assumes
that read-only replicas are 'shared' (i.e. what you called using "writability" of a replica
as a proxy for deducing whether or not that replica is shared).
# Read-only replicas cannot be pruned (follows from (2)).
# Client should be able to bootstrap a write pipeline with read-only replicas.
# Read-only storages will not be used for block placement.

I am not sure if there are any special conditions wrt lease recovery that also need to be

> Pluggable interface for replica counting
> ----------------------------------------
>                 Key: HDFS-5318
>                 URL: https://issues.apache.org/jira/browse/HDFS-5318
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.0
>            Reporter: Eric Sirianni
>         Attachments: HDFS-5318.patch, hdfs-5318.pdf
> There are several use cases for using shared-storage for datanode block storage in an
HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  However,
for most of the current uses of 'replication count' in the Namenode, the "number of physical
copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to accurately infer
distinct physical copies in a shared storage environment.  With HDFS-5115, a {{StorageID}}
is a UUID.  I propose associating some minor additional semantics to the {{StorageID}} - namely
that multiple datanodes attaching to the same physical shared storage pool should report the
same {{StorageID}} for that pool.  A minor modification would be required in the DataNode
to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface.
> With those semantics in place, the number of physical copies of a block in a shared storage
environment can be calculated as the number of _distinct_ {{StorageID}} s associated with
that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A,
S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* physical
replicas (i.e. the traditional HDFS case with local disks)
> ** &rarr; Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* physical
replica (e.g. HDFS datanodes mounting the same NAS share)
> ** &rarr; Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication factor in the
namenode as *2* instead of *4*.

This message was sent by Atlassian JIRA

View raw message