hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1348) DecommissionManager holds fsnamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not
Date Fri, 20 Aug 2010 22:06:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900885#action_12900885
] 

Hairong Kuang commented on HDFS-1348:
-------------------------------------

Here is the proposed change.

Currently the check is logic is as following:
{code}
synchronized (fsnamesystem) {
  for up to five decommission in progress datanodes
     check each datanode if all its blocks have replicated;
}
{code}

I plan to change the structure to be:
{code}
for up to five iterations {
  synchronized (fsnamesystem) {
    node = get next decommission in progress node;
  }

 do {
   synchronized (fsnamesystem) {
     fetch up to 2000 unchecked blocks from node;
   }
   for each block b
     synchronized (fsnamesystem) {
       check if block b has replicated;
    }
 } until all blocks of node have checked;
}
{code}

This proposed restructure will make the locking granularity much smaller. This should improve
NameNode's responsiveness when decommissioning check occurs.
      

> DecommissionManager holds fsnamesystem lock during the whole process of checking if decomissioning
DataNodes are finished or not
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1348
>                 URL: https://issues.apache.org/jira/browse/HDFS-1348
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
>
>
> NameNode normally is busy all the time. Its log is full of activities every second. But
once for a while, NameNode seems to pause for more than 10 seconds without doing anything,
leaving a blank in its log even though no garbage collection is happening.
> One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the
whole process of checking if decomissioning DataNodes are finished or not, during which it
checks every block of up to a default of 5 datanodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message