hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-814) Increase dfs scalability by optimizing locking on namenode.
Date Fri, 15 Dec 2006 19:02:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-814?page=comments#action_12458885 ] 
Raghu Angadi commented on HADOOP-814:

locking overhead wont be much. Since heartbeat lock is also one of the 'hot' locks, most of
the time it might wait for current heartbeat processing to complete.

> Increase dfs scalability by optimizing locking on namenode.
> -----------------------------------------------------------
>                 Key: HADOOP-814
>                 URL: http://issues.apache.org/jira/browse/HADOOP-814
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: heartbeatlock3.patch
> The current dfs namenode encounters locking bottlenecks when the number of datanodes
is large. The namenode uses a single global lock to protect access to data structures. One
key area is heartbeat processing. The lower the cost of processing a heartbeat, more the number
of nodes HDFS can support.  A simple change to this current locking model can increase the
scalability. Here are the details:
> Case 1: Currently we have three locks, the global lock (on FSNamesystem), the heartbeat
lock and the datanodeMap lock. The following function is called when a heartbeat is received
by the Namenode
> public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
>     synchronized (heartbeat) {                                        ........ (B)
>       synchronized (datanodeMap) {                               ......... (C)
>    ...
>      }
> }
> In the above piece of code, statement (A) acquires the global-FSNamesystem-lock. This
synchronization can be safely removed (remove updateStats too). This means that a heartbeat
from the datanode can be processed without holding the FSnamesystem-global-lock.
> Case 2: A following thread called the heartbeatCheck thread periodically traverses all
known Datanodes to determine if any of them has timed out. It is of the following form:
> void FSNamesystem.heartbeatCheck() {
>             synchronized (this) {                                        ...........
>                         synchronized (heartbeats) {                .............(E) 
> }
> This thread acquires the global-FSNamesystem lock in Statement (D). This statement (D)
can be removed. Instead the loop can check to see if any nodes are dead. If a dead node is
found, only then it acquires the FSNamesystem-global-lock.
> It is possible that fixing the above two cases will cause HDFS to scale to higher number
of nodes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message