hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7955) Improve naming of classes, methods, and variables related to block replication and recovery
Date Wed, 17 Feb 2016 22:29:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151311#comment-15151311
] 

Zhe Zhang commented on HDFS-7955:
---------------------------------

Good points Andrew.

I took a closer look at related class and variable names. I think {{Replica}} is the hardest
to change, because 1) it's hard to come up with a good alternative; 2) DN concepts like {{RUR}},
{{RBW}} are pretty deep-rooted. So here's my proposed plan:
# Enhance documentation stating that an EC _internal block_ / _storage block_ is a special
kind of replica.
# Try to rename {{BlockManager}} internal naming in a consistent way. I see 2 main sources:
{{UnderReplicatedBlocks}} and {{PendingReplicationBlocks}}. The former should be renamed to
{{LowRedundancyBlocks}} and the latter {{PendingReconstructionBlocks}}.
# (beyond this JIRA's scope) Try to add public APIs  to report low redundancy EC blocks.

Thoughts?

> Improve naming of classes, methods, and variables related to block replication and recovery
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7955
>                 URL: https://issues.apache.org/jira/browse/HDFS-7955
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: erasure-coding
>            Reporter: Zhe Zhang
>            Assignee: Rakesh R
>         Attachments: HDFS-7955-001.patch, HDFS-7955-002.patch, HDFS-7955-003.patch, HDFS-7955-004.patch,
HDFS-7955-5.patch
>
>
> Many existing names should be revised to avoid confusion when blocks can be both replicated
and erasure coded. This JIRA aims to solicit opinions on making those names more consistent
and intuitive.
> # In current HDFS _block recovery_ refers to the process of finalizing the last block
of a file, triggered by _lease recovery_. It is different from the intuitive meaning of _recovering
a lost block_. To avoid confusion, I can think of 2 options:
> #* Rename this process as _block finalization_ or _block completion_. I prefer this option
because this is literally not a recovery.
> #* If we want to keep existing terms unchanged we can name all EC recovery and re-replication
logics as _reconstruction_.  
> # As Kai [suggested | https://issues.apache.org/jira/browse/HDFS-7369?focusedCommentId=14361131&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14361131]
under HDFS-7369, several replication-based names should be made more generic:
> #* {{UnderReplicatedBlocks}} and {{neededReplications}}. E.g. we can use {{LowRedundancyBlocks}}/{{AtRiskBlocks}},
and {{neededRecovery}}/{{neededReconstruction}}.
> #* {{PendingReplicationBlocks}}
> #* {{ReplicationMonitor}}
> I'm sure the above list is incomplete; discussions and comments are very welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message