hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-599) Improve Namenode robustness by prioritizing datanode heartbeats over client requests
Date Tue, 18 May 2010 22:47:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868881#action_12868881
] 

Hairong Kuang commented on HDFS-599:
------------------------------------

I agree with Sanjay and Dhruba that breaking the client protocol in 2 parts is not architecturally
clean. I assume that we will take the ACL solution to restrict the access to a port.

Here are more code review comments:
1. Please remove all the unnecessary indention/blank changes;
2. Please rename DFS_NAMENODE-DN_RPC_ADDRESS_KEY to be DFS_NAMENODE_SERViCE_KEY;
3. Rename NameNode#dnServer to be serviceRpcServer. Provide comments explaining what are server/dnServer;
4. Provide javadoc for get/setServiceRpcServerAddress;

For the tests:
1. TestRestartFS: could you reorganize your code to reduce duplicate code in two tests?
2. TestHDFSServerPorts: Do the test always start service port in NN? It might be nice if we
also test both.
3. TestDistributedFileSystem: could you please explain what are the changes you made to testFileChecksum?

> Improve Namenode robustness by prioritizing datanode heartbeats over client requests
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-599
>                 URL: https://issues.apache.org/jira/browse/HDFS-599
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-599.patch
>
>
> The namenode processes RPC requests from clients that are reading/writing to files as
well as heartbeats/block reports from datanodes.
> Sometime, because of various reasons (Java GC runs, inconsistent performance of NFS filer
that stores HDFS transacttion logs, etc), the namenode encounters transient slowness. For
example, if the device that stores the HDFS transaction logs becomes sluggish, the Namenode's
ability to process RPCs slows down to a certain extent. During this time, the RPCs from clients
as well as the RPCs from datanodes suffer in similar fashion. If the underlying problem becomes
worse, the NN's ability to process a heartbeat from a DN is severly impacted, thus causing
the NN to declare that the DN is dead. Then the NN starts replicating blocks that used to
reside on the now-declared-dead datanode. This adds extra load to the NN. Then the now-declared-datanode
finally re-establishes contact with the NN, and sends a block report. The block report processing
on the NN is another heavyweight activity, thus casing more load to the already overloaded
namenode. 
> My proposal is tha the NN should try its best to continue processing RPCs from datanodes
and give lesser priority to serving client requests. The Datanode RPCs are integral to the
consistency and performance of the Hadoop file system, and it is better to protect it at all
costs. This will ensure that NN  recovers from the hiccup much faster than what it does now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message