hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-814) Increase dfs scalability by optimizing locking on namenode.
Date Tue, 12 Dec 2006 23:29:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-814?page=comments#action_12457939 ] 
            
Konstantin Shvachko commented on HADOOP-814:
--------------------------------------------

= I like the idea of checking whether the data-node is alive before locking the whole namespace.

= I do not understand whether the removal of updateStats() is driven by the locking changes?
= As a result you calculate totalCapacity and totalRemaining every time a client requests
it.
   I'm not worried that it'll take longer to sum fields for all nodes,
   I'm worried this it will lock the namespace for a longer period of time.
= NameNode.getStats() calculates totalCapacity twice, which gets expensive in your implementation.
        results[0] = namesystem.totalCapacity();
        results[1] = namesystem.totalCapacity() - namesystem.totalRemaining();
= Another bad result that you are not updating the totalLoad field,
   which disables load balancing on the cluster.


> Increase dfs scalability by optimizing locking on namenode.
> -----------------------------------------------------------
>
>                 Key: HADOOP-814
>                 URL: http://issues.apache.org/jira/browse/HADOOP-814
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: heartbeatlock.patch
>
>
> The current dfs namenode encounters locking bottlenecks when the number of datanodes
is large. The namenode uses a single global lock to protect access to data structures. One
key area is heartbeat processing. The lower the cost of processing a heartbeat, more the number
of nodes HDFS can support.  A simple change to this current locking model can increase the
scalability. Here are the details:
> Case 1: Currently we have three locks, the global lock (on FSNamesystem), the heartbeat
lock and the datanodeMap lock. The following function is called when a heartbeat is received
by the Namenode
> public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
>     synchronized (heartbeat) {                                        ........ (B)
>       synchronized (datanodeMap) {                               ......... (C)
>    ...
>      }
> }
> In the above piece of code, statement (A) acquires the global-FSNamesystem-lock. This
synchronization can be safely removed (remove updateStats too). This means that a heartbeat
from the datanode can be processed without holding the FSnamesystem-global-lock.
> Case 2: A following thread called the heartbeatCheck thread periodically traverses all
known Datanodes to determine if any of them has timed out. It is of the following form:
> void FSNamesystem.heartbeatCheck() {
>             synchronized (this) {                                        ...........
(D)
>                         synchronized (heartbeats) {                .............(E) 
> }
> This thread acquires the global-FSNamesystem lock in Statement (D). This statement (D)
can be removed. Instead the loop can check to see if any nodes are dead. If a dead node is
found, only then it acquires the FSNamesystem-global-lock.
> It is possible that fixing the above two cases will cause HDFS to scale to higher number
of nodes.
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message