hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-814) Increase dfs scalability by optimizing locking on namenode.
Date Tue, 12 Dec 2006 05:08:21 GMT
Increase dfs scalability by optimizing locking on namenode.

                 Key: HADOOP-814
                 URL: http://issues.apache.org/jira/browse/HADOOP-814
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur
         Assigned To: dhruba borthakur

The current dfs namenode encounters locking bottlenecks when the number of datanodes is large.
The namenode uses a single global lock to protect access to data structures. One key area
is heartbeat processing. The lower the cost of processing a heartbeat, more the number of
nodes HDFS can support.  A simple change to this current locking model can increase the scalability.
Here are the details:

Case 1: Currently we have three locks, the global lock (on FSNamesystem), the heartbeat lock
and the datanodeMap lock. The following function is called when a heartbeat is received by
the Namenode

public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
    synchronized (heartbeat) {                                        ........ (B)
      synchronized (datanodeMap) {                               ......... (C)

In the above piece of code, statement (A) acquires the global-FSNamesystem-lock. This synchronization
can be safely removed (remove updateStats too). This means that a heartbeat from the datanode
can be processed without holding the FSnamesystem-global-lock.

Case 2: A following thread called the heartbeatCheck thread periodically traverses all known
Datanodes to determine if any of them has timed out. It is of the following form:

void FSNamesystem.heartbeatCheck() {
            synchronized (this) {                                        ........... (D)
                        synchronized (heartbeats) {                .............(E) 

This thread acquires the global-FSNamesystem lock in Statement (D). This statement (D) can
be removed. Instead the loop can check to see if any nodes are dead. If a dead node is found,
only then it acquires the FSNamesystem-global-lock.

It is possible that fixing the above two cases will cause HDFS to scale to higher number of


This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message