hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
Date Thu, 12 Nov 2015 00:02:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001367#comment-15001367
] 

Anu Engineer commented on HDFS-9239:
------------------------------------

[~cnauroth] Without alluding to anything that [~daryn] brought up, Thanks for the patch. It
looks very good, some minor comments / questions below. 

1.  nit : {{BPofferService#Constructor}} 
Perhaps add a Precondition to make this relationship explicit.  something like Preconditions.checkState(lifelineNnAddrs.size()
== nnAddrs.size()), since we access both lists with the same index ? 

2. More of a question for my own understanding :
In  {{BPOfferServiceActor#run}} we retry await operation on being interrupted. My question
is when would it be safe to retry ? Wanted to understand if there were any scenarios where
this can happen. Btw, I do see this as a good coding pattern.

{code}
while (shouldRun()) {
        try {
          initialRegistrationComplete.await();
          break;
        } catch (InterruptedException e) {
          Thread.currentThread().interrupt();
        }
      }
{code}


3. nit: DataNodeManager.java - Javadoc - xmitsInProgress replaced by maxTransfers

4. In DataNodeManager.java : 
{code}
  synchronized (datanodeMap) {
        DatanodeDescriptor nodeinfo = getDatanode(nodeReg);
{code}
Two comments here :
  a. Just wondering if it makes sense to move synchronized(datanodeMap) into getDataNode.
 b. Did you intend to call {{heartbeatManager.updateLifeline}} inside the {{synchronized(datanodeMap)}}
or just inside {{synchronized (heartbeatManager)}}. Do we need to keep the lock on datanodeMap
while updating stats ? 

5. hdfs-default.xml : nit  : Comment :  since we rely on ratio as the default you might want
to fix the comment which says default is 1.
  <property>
  <name>dfs.namenode.lifeline.handler.count</name>
  <value></value>
  <description>
    Sets number of RPC server threads the NameNode runs for handling the
    lifeline RPC server.  The default value is 1, because this RPC server
    handles only HA health check requests from ZKFC.  These are lightweight


> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-9239
>                 URL: https://issues.apache.org/jira/browse/HDFS-9239
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch
>
>
> This issue proposes introduction of a new feature: the DataNode Lifeline Protocol.  This
is an RPC protocol that is responsible for reporting liveness and basic health information
about a DataNode to a NameNode.  Compared to the existing heartbeat messages, it is lightweight
and not prone to resource contention problems that can harm accurate tracking of DataNode
liveness currently.  The attached design document contains more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message