hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-814) Increase dfs scalability by optimizing locking on namenode.
Date Tue, 12 Dec 2006 05:08:21 GMT
Increase dfs scalability by optimizing locking on namenode.
-----------------------------------------------------------

                 Key: HADOOP-814
                 URL: http://issues.apache.org/jira/browse/HADOOP-814
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur
         Assigned To: dhruba borthakur


The current dfs namenode encounters locking bottlenecks when the number of datanodes is large.
The namenode uses a single global lock to protect access to data structures. One key area
is heartbeat processing. The lower the cost of processing a heartbeat, more the number of
nodes HDFS can support.  A simple change to this current locking model can increase the scalability.
Here are the details:

Case 1: Currently we have three locks, the global lock (on FSNamesystem), the heartbeat lock
and the datanodeMap lock. The following function is called when a heartbeat is received by
the Namenode

public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
    synchronized (heartbeat) {                                        ........ (B)
      synchronized (datanodeMap) {                               ......... (C)
   ...
     }
}

In the above piece of code, statement (A) acquires the global-FSNamesystem-lock. This synchronization
can be safely removed (remove updateStats too). This means that a heartbeat from the datanode
can be processed without holding the FSnamesystem-global-lock.

Case 2: A following thread called the heartbeatCheck thread periodically traverses all known
Datanodes to determine if any of them has timed out. It is of the following form:

void FSNamesystem.heartbeatCheck() {
            synchronized (this) {                                        ........... (D)
                        synchronized (heartbeats) {                .............(E) 
}

This thread acquires the global-FSNamesystem lock in Statement (D). This statement (D) can
be removed. Instead the loop can check to see if any nodes are dead. If a dead node is found,
only then it acquires the FSNamesystem-global-lock.

It is possible that fixing the above two cases will cause HDFS to scale to higher number of
nodes.

 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message