hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-8776) Decom manager should not be active on standby
Date Tue, 14 Jul 2015 20:16:05 GMT
Daryn Sharp created HDFS-8776:
---------------------------------

             Summary: Decom manager should not be active on standby
                 Key: HDFS-8776
                 URL: https://issues.apache.org/jira/browse/HDFS-8776
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.6.0
            Reporter: Daryn Sharp
            Assignee: Daryn Sharp


The decommission manager should not be actively processing on the standby.

The decomm manager goes through the costly computation for determining every block on the
node requires replication yet doesn't queue them for replication - because it's in standby.
The decomm manager is holding the namesystem write lock, causing DNs to timeout on heartbeats
or IBRs, NN purges the call queue of timed out clients, NN processes some heartbeats/IBRs
before the decomm manager locks up the namesystem again. Nodes attempting to register will
be sending full BRs which are more costly to send and discard than a heartbeat.

If a failover is required, the standby will likely have to struggle very hard to not GC while
"catching up" on its queued IBRs while DNs continue to fill the call queue and time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message