hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-814) Increase dfs scalability by optimizing locking on namenode.
Date Mon, 18 Dec 2006 19:45:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-814?page=comments#action_12459439 ] 
            
Hadoop QA commented on HADOOP-814:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12347142/heartbeatlock3.patch
applied and successfully tested against trunk revision r487715.

> Increase dfs scalability by optimizing locking on namenode.
> -----------------------------------------------------------
>
>                 Key: HADOOP-814
>                 URL: http://issues.apache.org/jira/browse/HADOOP-814
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: heartbeatlock3.patch
>
>
> The current dfs namenode encounters locking bottlenecks when the number of datanodes
is large. The namenode uses a single global lock to protect access to data structures. One
key area is heartbeat processing. The lower the cost of processing a heartbeat, more the number
of nodes HDFS can support.  A simple change to this current locking model can increase the
scalability. Here are the details:
> Case 1: Currently we have three locks, the global lock (on FSNamesystem), the heartbeat
lock and the datanodeMap lock. The following function is called when a heartbeat is received
by the Namenode
> public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
>     synchronized (heartbeat) {                                        ........ (B)
>       synchronized (datanodeMap) {                               ......... (C)
>    ...
>      }
> }
> In the above piece of code, statement (A) acquires the global-FSNamesystem-lock. This
synchronization can be safely removed (remove updateStats too). This means that a heartbeat
from the datanode can be processed without holding the FSnamesystem-global-lock.
> Case 2: A following thread called the heartbeatCheck thread periodically traverses all
known Datanodes to determine if any of them has timed out. It is of the following form:
> void FSNamesystem.heartbeatCheck() {
>             synchronized (this) {                                        ...........
(D)
>                         synchronized (heartbeats) {                .............(E) 
> }
> This thread acquires the global-FSNamesystem lock in Statement (D). This statement (D)
can be removed. Instead the loop can check to see if any nodes are dead. If a dead node is
found, only then it acquires the FSNamesystem-global-lock.
> It is possible that fixing the above two cases will cause HDFS to scale to higher number
of nodes.
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message