hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5318) Pluggable interface for replica counting
Date Thu, 19 Dec 2013 22:15:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853383#comment-13853383

Arpit Agarwal commented on HDFS-5318:

Eric, your patch is implementing existing NN functionality via SPIs and not adding the proposed
duplicate replica detection correct?

I'll use logical replicas to refer to multiple views of the same physical replica on different
DNs. I have some questions about the mechanics of exposing one storage via multiple DataNodes.
# Is the Jira description still accurate i.e. will the storages have the same {{StorageID}}
regardless of which DN exposes it? If this is true then there will be multiple {{DatanodeStorageInfo}}
objects with the same {{StorageID}} on the NameNode.
# Are you assuming that at most one DataNode will present the same physical storage as being
read-write? (It looks like you are)
# Which DataNode will be responsible for block reports/incremental block reports for a given
storage? I am not sure how block report processing will be affected.
# Will {{BlockInfo}} have one triplet for each logical replica? This means that the {{BlockInfo}}
will have multiple {{DatanodeStorage}} entries which have the same {{StorageID}}. I am not
sure this will work.
# What is your intended behavior of {{getLocatedBlocks}}? Should one located block be exposed
for each logical replica?

I am concerned that we have not caught all the locations in the NameNode code where you will
need to plug in, and catching them all will increase code complexity for the common case.
As you mentioned in your doc to get this working we'd ideally need to fix the BlockManager
to detect physical replicas by the target {{StorageID}}.

Suresh and I had a quick chat about this offline. If there is just one read-write logical
replica for a given physical replica at a time we may be able to support this more easily
without needing pluggable interface, with the following assumptions:
# Each DataNode presents a different {{StorageID}} for the same physical storage.
# Read-only replicas don't count towards excess replicas. In fact read-only replicas of incomplete
blocks should probably not be counted towards satisfying the target replication factor.
# Per (2), incomplete blocks with just read-only replicas are corrupt since there is no append
path to the block.
# Read-only replicas cannot be pruned.
# Read-only replicas are not included in the write pipeline.
# Read-only storages will not be used for block placement.

Some of these conditions should be already enforced by the NameNode today, we can fix the

> Pluggable interface for replica counting
> ----------------------------------------
>                 Key: HDFS-5318
>                 URL: https://issues.apache.org/jira/browse/HDFS-5318
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.0
>            Reporter: Eric Sirianni
>         Attachments: HDFS-5318.patch, hdfs-5318.pdf
> There are several use cases for using shared-storage for datanode block storage in an
HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  However,
for most of the current uses of 'replication count' in the Namenode, the "number of physical
copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to accurately infer
distinct physical copies in a shared storage environment.  With HDFS-5115, a {{StorageID}}
is a UUID.  I propose associating some minor additional semantics to the {{StorageID}} - namely
that multiple datanodes attaching to the same physical shared storage pool should report the
same {{StorageID}} for that pool.  A minor modification would be required in the DataNode
to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface.
> With those semantics in place, the number of physical copies of a block in a shared storage
environment can be calculated as the number of _distinct_ {{StorageID}} s associated with
that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A,
S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* physical
replicas (i.e. the traditional HDFS case with local disks)
> ** &rarr; Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* physical
replica (e.g. HDFS datanodes mounting the same NAS share)
> ** &rarr; Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication factor in the
namenode as *2* instead of *4*.

This message was sent by Atlassian JIRA

View raw message