hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11447) Ozone: SCM: Send node report along with heartbeat to SCM
Date Wed, 01 Mar 2017 01:47:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889298#comment-15889298
] 

Anu Engineer commented on HDFS-11447:
-------------------------------------

[~xyao] The changes look excellent. Thank you for the patch. Some minor comments.

* nit: Storage Location Report -- Rename ?? -- But I don't have a better suggestion.

* ContainerLocationManager.java
Line 109: I see that we have todo to handle failed volume.
Just a suggestion:  Do you want to put the code in the getScmUsed() and getAvailable() in
a try catch so if it throws, you still can get info about the other locations.

* We already have proper shutdown call in StateMachine, would it makes sense 
to save the value in that code insted of adding a shutdown hook. It will simpler and all shutdown
code will be in one place.
{code}
    // Make the scm usage to be saved during shutdown.
    ShutdownHookManager.get().addShutdownHook(
        new Runnable() {
          @Override
          public void run() {
            if (!scmUsedSaved) {
              saveScmUsed();
            }
          }
        }, SHUTDOWN_HOOK_PRIORITY);
{code}

* {{moveStaleNodeToDead()}} -- We update ScmStats in this function, but not in healthyToStaleNodes.
Just to make sure that I understand this clearly, does this mean that we send traffic to stale
nodes ?

* {{handleHeartbeat()}} 
In the earlier code  {{monotonicNow()}} was used since the recvTime could be different from
when the HB processing thread actually ran. So instead of penalizing HB wait time in the queue,
we just update the last HB time as the time we saw it. That was a conscious decision, if we
start updating  the recvTime, then we also need to make sure that our response time is within
a reasonable time frame. That is we need to guarantee the HB's will get processed within a
window of time.
*Line 522:*  {{long lastTimestamp = hbItem.getRecvTimestamp();}}
*Line 528:*  {{healthyNodes.put(datanodeID, lastTimestamp);}}
if we update the code to use monotonicNow() we avoid this problem of packet starvation due
to queue and HB processing thread. See the comments in *line 431* about how HB thread assumes
that no issues comes from not being able to run in real time.
In fact, I suggest that we use the {{hbItem.getRecvTimestamp()}} and current time difference
as a metric which lets us know the average time we queue a HB packet. It is very useful metric
to have.

* {{handleHeartbeat()}} -- Do we need to call this function 3 times inside 3 separate if statements
or can we move it outside once ? {{updateNodeStat(datanodeID, nodeReport);}}

> Ozone: SCM: Send node report along with heartbeat to SCM
> --------------------------------------------------------
>
>                 Key: HDFS-11447
>                 URL: https://issues.apache.org/jira/browse/HDFS-11447
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HDFS-11447-HDFS-7240.001.patch
>
>
> The storage utilization information on datanode should be reported to SCM to help decide
container allocation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message