hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2554) Add separate metrics for missing blocks with desired replication level 1
Date Mon, 30 Jul 2012 21:55:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425290#comment-13425290
] 

Andy Isaacson commented on HDFS-2554:
-------------------------------------

Thanks for the comment, I need to think about most of it but I have one immediate response,
to the last paragraph:
bq. split {{CorruptBlocksRN}} into c<r and c==r
I don't buy it, because there's no fundamental difference between these two cases. In either
case, all of the replicas the NN knows about are corrupt.  The block may have been underreplicated
when we discovered all the existing replicas are corrupt (in which case r=3 c=2 but the block
should still semantically be counted in your CriticallyCorrupt bucket); the block may have
been overreplicated when we discovered the corruption (so r=3 c=4 is possible).  In all cases
the administrator action is the same.

Keep in mind that the driving idea behind this change is that there are different recommended
actions for an administrator responding to each of these 4 categories.  Simply multiplying
metrics because we are able to count them is not a benefit.
                
> Add separate metrics for missing blocks with desired replication level 1
> ------------------------------------------------------------------------
>
>                 Key: HDFS-2554
>                 URL: https://issues.apache.org/jira/browse/HDFS-2554
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Andy Isaacson
>            Priority: Minor
>
> Some users use replication level set to 1 for datasets which are unimportant and can
be lost with no worry (eg the output of terasort tests). But other data on the cluster is
important and should not be lost. It would be useful to separate the metric for missing blocks
by the desired replication level of those blocks, so that one could ignore missing blocks
at repl 1 while still alerting on missing blocks with higher desired replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message