hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takanobu Asanuma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
Date Fri, 24 Feb 2017 14:43:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882803#comment-15882803

Takanobu Asanuma commented on HDFS-10999:

Thanks for the good summary, [~manojg]! I agree with you for the most part. I want to share
my thoughts.

1. +1 for not changing {{fsck}}.


2, 3. I think changing {{dfsadmin -report}} and {{NN-WebUI}} are almost same work because
they refers to the same metrics of {{FSNamesystemMBean}}. So the key point is how to extend

– For backward compatibility reasons, let the current FSNameSystem#getStats() be as is,
and will continue to return cumulative stats for all Block combined.
– Introduce FSNameSystem#getReplicatedBlockStats() and FSNameSystem#getECBlockStats() to
capture Replicated and EC Blocks stats separately.

I agree with that. And I think this is fit for my suggestion that is adding new two mbeans
for replicated-blocks and ec-block-groups to {{FSNamesystem}}.

*My proposal based on your proposal* :
-- Since {{FSNameSystem#getStats}} refers to {{FSNameSystemMBean}}, let them be as they are.
It would be good if we use the new generic terms here.
-- Add new mbeans, {{ReplicatedBlockMBean}} and {{ECBlockGroupMBean}}, to {{FSNamesystem}}.
-- {{FSNameSystem#getReplicatedBlockStats}} refers to {{ReplicatedBlockMBean}}.
-- {{FSNameSystem#getECBlockGroupStats}} refers to {{ECBlockGroupMBean}}.


Let's be careful with terminology to avoid confusions. Referring to fsck would be better.

|| replicated || erasure coded ||
| block(s) | block group(s) |
| replica(s) | internal block(s) |

So like this:
# hdfs dfsadmin -report
Configured Capacity: 1498775814144 (1.36 TB)
Present Capacity: 931852427264 (867.86 GB)
DFS Remaining: 931805765632 (867.81 GB)
DFS Used: 46661632 (44.50 MB)
DFS Used%: 0.01%
Replicated Blocks:
  Under replicated blocks: 0
  Blocks with corrupt replicas: 0
  Missing blocks: 0
  Missing blocks (with replication factor 1): 0
  Pending deletion blocks: 0
Erasure Coded Block Groups:
  Under ec block groups: 0
  EC block groups with corrupt internal blocks: 0
  Missing ec block groups: 0
  Pending deletion ec block groups: 0

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -----------------------------------------------------------------------------
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Manoj Govindassamy
>              Labels: hdfs-ec-3.0-nice-to-have, supportability
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic term "low
redundancy" to the old-fashioned "under replicated". But this term is still being used in
messages in several places, such as web ui, dfsadmin and fsck. We should probably change them
to avoid confusion.
> File this jira to discuss it.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message