hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-814) Increase dfs scalability by optimizing locking on namenode.
Date Tue, 12 Dec 2006 23:29:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-814?page=comments#action_12457939 ] 
Konstantin Shvachko commented on HADOOP-814:

= I like the idea of checking whether the data-node is alive before locking the whole namespace.

= I do not understand whether the removal of updateStats() is driven by the locking changes?
= As a result you calculate totalCapacity and totalRemaining every time a client requests
   I'm not worried that it'll take longer to sum fields for all nodes,
   I'm worried this it will lock the namespace for a longer period of time.
= NameNode.getStats() calculates totalCapacity twice, which gets expensive in your implementation.
        results[0] = namesystem.totalCapacity();
        results[1] = namesystem.totalCapacity() - namesystem.totalRemaining();
= Another bad result that you are not updating the totalLoad field,
   which disables load balancing on the cluster.

> Increase dfs scalability by optimizing locking on namenode.
> -----------------------------------------------------------
>                 Key: HADOOP-814
>                 URL: http://issues.apache.org/jira/browse/HADOOP-814
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: heartbeatlock.patch
> The current dfs namenode encounters locking bottlenecks when the number of datanodes
is large. The namenode uses a single global lock to protect access to data structures. One
key area is heartbeat processing. The lower the cost of processing a heartbeat, more the number
of nodes HDFS can support.  A simple change to this current locking model can increase the
scalability. Here are the details:
> Case 1: Currently we have three locks, the global lock (on FSNamesystem), the heartbeat
lock and the datanodeMap lock. The following function is called when a heartbeat is received
by the Namenode
> public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
>     synchronized (heartbeat) {                                        ........ (B)
>       synchronized (datanodeMap) {                               ......... (C)
>    ...
>      }
> }
> In the above piece of code, statement (A) acquires the global-FSNamesystem-lock. This
synchronization can be safely removed (remove updateStats too). This means that a heartbeat
from the datanode can be processed without holding the FSnamesystem-global-lock.
> Case 2: A following thread called the heartbeatCheck thread periodically traverses all
known Datanodes to determine if any of them has timed out. It is of the following form:
> void FSNamesystem.heartbeatCheck() {
>             synchronized (this) {                                        ...........
>                         synchronized (heartbeats) {                .............(E) 
> }
> This thread acquires the global-FSNamesystem lock in Statement (D). This statement (D)
can be removed. Instead the loop can check to see if any nodes are dead. If a dead node is
found, only then it acquires the FSNamesystem-global-lock.
> It is possible that fixing the above two cases will cause HDFS to scale to higher number
of nodes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message