hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
Date Sat, 15 Oct 2016 18:53:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578574#comment-15578574
] 

Allen Wittenauer commented on HDFS-10999:
-----------------------------------------

bq. That's what I was getting at with the pendingReconstructionBlocksCount. If we fix it as
I talked about above, it'd actually tell you how much work is remaining, and how fast that
work is progressing.

That might work, but I just had a thought.  Are we exposing how many blocks are EC blocks
and how many blocks are normally replicated blocks?  (If not, I really hope the explanation
is a good one...) It seems that we should have symmetry here.  If we have N types of blocks,
I'm going to want to know NxM counts of information.  It's pretty much the only way that advanced
users will know if certain types of blocks are actually working to their benefit.  Like compression,
space savings isn't the only consideration.

bq. I really, really hope that manually copying blocks around is not a normal part of operating
an HDFS cluster.
...
bq.  I recall seeing some customer issues where we temporarily bumped up these values to more
quickly recover from failures.

You've sort of answered your own question. ;)

Most of the advanced admins I know do it several times a year, either because the NN was too
stupid to fix it's own replication problems and/or because it was simply faster for us to
do it rather than wait for the normal block replication process. 

For example, as an admin, I might know that there is no YARN running on a source node or the
destination node, so it's totally OK to do a brute copy from one DN to another other without
busting the network.  HDFS block deletes are significantly faster than replication, so just
do the copy, run the balancer, and let the NN remove the duplicates at it's leisure.  All
without fumbling with the continually ever growing and poorly documented HDFS settings.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Yuanbo Liu
>              Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic term "low
redundancy" to the old-fashioned "under replicated". But this term is still being used in
messages in several places, such as web ui, dfsadmin and fsck. We should probably change them
to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message