hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
Date Sat, 15 Oct 2016 00:02:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576938#comment-15576938
] 

Andrew Wang commented on HDFS-10999:
------------------------------------

Thanks for the insight Allen,

{quote}
M: "How long for recovery?"
A: "No idea. The NN doesn't tell me if these are EC blocks or regular blocks that were lost
and one is faster to recover than the other."
{quote}

That's what I was getting at with the pendingReconstructionBlocksCount. If we fix it as I
talked about above, it'd actually tell you how much work is remaining, and how fast that work
is progressing.

{quote}
...I've also used it during system recovery and migrations as a measurement of how many more
DNs I need to bring up such that I have more sources for block replication. 
{quote}

Would the "pending" queue metrics also work for this? We can also look at improved DN-side
metrics related to replication work.

{quote}
This number represents something that I as an admin have some semblance of control over: I
could always manually copy blocks from one node to another to speed things up.
Under EC, I don't know of anything manual I can do if it is missing chunks of blocks.
{quote}

I really, really hope that manually copying blocks around is not a normal part of operating
an HDFS cluster.

Point is still valid though, maybe we should take a harder look at the recovery work throttles
on the NN and DN, and make them dynamically reconfigurable if they aren't. I recall seeing
some customer issues where we temporarily bumped up these values to more quickly recover from
failures.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Yuanbo Liu
>              Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic term "low
redundancy" to the old-fashioned "under replicated". But this term is still being used in
messages in several places, such as web ui, dfsadmin and fsck. We should probably change them
to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message