hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-8869) Don't mark storages as failed before first block report
Date Thu, 06 Aug 2015 21:28:06 GMT
Rushabh S Shah created HDFS-8869:
------------------------------------

             Summary: Don't mark storages as failed before first block report
                 Key: HDFS-8869
                 URL: https://issues.apache.org/jira/browse/HDFS-8869
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.7.0
            Reporter: Rushabh S Shah
            Assignee: Daryn Sharp


Creating this ticket on behalf of [~daryn].

Heartbeat processing performs the failed storage check. The DN reports its storages and any
prior missing storages, ex. unique storage id upgrade, are marked failed. The heartbeat monitor
removes all blocks associated to the failed storage. A replication storm ensues for all blocks
on the node.

Eventually the DN block reports for the new storages - up to 15m later for large clusters.
Now the NN has many excess blocks to invalidate. If the cluster has failed over in the past
24h, ex. rolling upgrade, the standby gone active will queue the block invalidations which
triggers the severe performance degradation of HDFS-8674 which has been greatly lessened but
is still an issue.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message