hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5318) Pluggable interface for replica counting
Date Fri, 20 Dec 2013 21:06:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854560#comment-13854560
] 

Arpit Agarwal commented on HDFS-5318:
-------------------------------------

{quote}
I'm not sure I understand the definition of an "incomplete" block. Does that mean a finalized
block whose length is less than the block size (i.e., a block with free capacity potential
for append)?
{quote}
That is what I meant, sorry about the confusion.

{quote}
Read-only replicas should never count towards the target replication factor regardless of
"complete" or "incomplete" right? In the following scenario, the block should be classified
as under replicated, right?
{quote}
It is fair to count read-only replicas of complete blocks towards the target replication factor
in general. However for your case since it is the same physical replica I guess it makes sense
not to count it. If no one else is using r/o storages we can do this. Suresh had suggested
the same yesterday.

{quote}
I don't fully understand the implications here. In a repcount=1 situation, when there is a
single R/W logical replica and the DataNode that's hosting it goes offline, what happens?
I would propose the following:
{quote}
I am not very familiar with how replica recovery works. Let me understand that and get back
to you by next week.

> Pluggable interface for replica counting
> ----------------------------------------
>
>                 Key: HDFS-5318
>                 URL: https://issues.apache.org/jira/browse/HDFS-5318
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.0
>            Reporter: Eric Sirianni
>         Attachments: HDFS-5318.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block storage in an
HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  However,
for most of the current uses of 'replication count' in the Namenode, the "number of physical
copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to accurately infer
distinct physical copies in a shared storage environment.  With HDFS-5115, a {{StorageID}}
is a UUID.  I propose associating some minor additional semantics to the {{StorageID}} - namely
that multiple datanodes attaching to the same physical shared storage pool should report the
same {{StorageID}} for that pool.  A minor modification would be required in the DataNode
to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface.
 
> With those semantics in place, the number of physical copies of a block in a shared storage
environment can be calculated as the number of _distinct_ {{StorageID}} s associated with
that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A,
S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* physical
replicas (i.e. the traditional HDFS case with local disks)
> ** &rarr; Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* physical
replica (e.g. HDFS datanodes mounting the same NAS share)
> ** &rarr; Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication factor in the
namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message