hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
Date Mon, 24 Oct 2016 17:33:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602650#comment-15602650
] 

Allen Wittenauer edited comment on HDFS-10999 at 10/24/16 5:33 PM:
-------------------------------------------------------------------

I've been out of town and I've had more time to think about this issue.

I'm pretty much convinced that tying what are effectively two metrics to a single value is
a bad idea.  I would really want to see the two values separated because it does directly
impact how maintenance windows and recovery are performed.  More information is significantly
more valuable than less here.  The same goes for other metrics such as rates:  I really do
want to know how long it is taking for full blocks to replicate vs. EC blocks to recovery.
 They have slightly different performance characteristics at the node level and advanced users
are going to want to know what the perf impact on any running jobs might be.

For example, if I know my nodes take x% of the CPU for EC recovery during a node migration,
I'm going to want to set the CPU settings for the Docker cgroups that I'm using to protect
my cluster from YARN's security issues differently during that migration to make sure I have
enough juice vs. normal operation.

In other words, this is not a good place to 'dumb down' the metrics. 

With that in mind, we should expect users deploying 3.x to need to change their fsck monitor
scripts.  If we have an appropriate release note and associated documentation giving what
the different text fields are, then this should be fine.


was (Author: aw):
I've been out of town and I've had more time to think about this issue.

I'm pretty much convinced that tying what are effectively two metrics to a single value is
a bad idea.  I would really want to see the two values separated because it does directly
impact how maintenance windows and recovery are performed.  More information is significantly
more valuable than less here.  The same goes for other metrics such as rates:  I really do
want to know how long it is taking for full blocks to replicate vs. EC blocks to recovery.
 They have slightly different performance characteristics at the node level and advanced users
are going to want to know what the perf impact on any runnings jobs might be.

For example, if I know my nodes take x% of the CPU for EC recovery during a node migration,
I'm going to want to set the CPU settings for the Docker cgroups that I'm using to protect
my cluster from YARN's security issues differently during that migration to make sure I have
enough juice vs. normal operation.

In other words, this is not a good place to 'dumb down' the metrics. 

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Yuanbo Liu
>              Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic term "low
redundancy" to the old-fashioned "under replicated". But this term is still being used in
messages in several places, such as web ui, dfsadmin and fsck. We should probably change them
to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message