hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyu Yao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11194) Maintain aggregated peer performance metrics on NameNode
Date Wed, 04 Jan 2017 00:19:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796663#comment-15796663

Xiaoyu Yao commented on HDFS-11194:

Thanks [~arpitagarwal] for working on this and all for the discussion. I have the following
comments on the production side changes. Still reviewing the unit test changes and will post
my comments on that soon.

1. BlockReceiver.java
NIT: Line 848: "&& mirrorAddr != null" can be removed.
Line 849: can be simplified with "peerMetrics.addSendPacketDownstream"

2. BPServiceActor.java
Line 1146: NIT: heatbeatTime can be changed to slowPeerReportTime or remove the 
parameter by hiding the montonicNow() call inside scheduleNextSlowPeerReport().

3. DatanodeManager.java
Line 52-53: NIT: avoid import *
import org.apache.hadoop.util.*;
import org.apache.hadoop.util.Timer; 

Line 180. The comments seems incomplete.

Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled
is true.

Line 1653-1660: NIT: can we tweak the code to avoid calling slowPeers.getSlowPeers()
multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) with
parameterized logging?

Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode
has registered? 

4. DataNodePeerMetrics.java
Line 142-143: Correct me if I'm wrong, looks like the comments is for stats Map 
in Line 137.

5. DatanodeProtocol.proto
Line 398-405. This is a very good document. Can we add a field indicating the 
DN aggregate mechanism? This way the NN can enforce consistent aggregation 
across all the datanodes. This can be done in a separate ticket. 

6. DFSConfigKeys.java
Line 677: document for dfs.datanode.slow.peers.report.interval? We can open 
separate ticket for it.

7. RollingAverage.java
Great catch on some missing synchronized on rollOverAvgs. 
NIT: Line 264: missing @param for minSamples

8. SlowNodeDetector.java 
Line 99-108: We can make this an interface to allow different aggregation methods 
(median, 90th percentile) for outlier detection. This can be done in a separate ticket.
We can also use Median/Percentile class from apache common to implement
different aggregation.

Line 127: we need to guard the tracing with if (LOG.isTraceEnabled()) to 
avoid the implicit sorted.toString() overhead.

9. SlowPeerReports.java
Line 44: NIT: typo consistenly -> consistently
Line 144: NIT: the document needs to update to match the code which returns
a map -> sortedset of string.
Line 190: Can we make MAX_NODES_TO_REPORT configurable? This can be fixed in 
a separate ticket.

> Maintain aggregated peer performance metrics on NameNode
> --------------------------------------------------------
>                 Key: HDFS-11194
>                 URL: https://issues.apache.org/jira/browse/HDFS-11194
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Xiaobing Zhou
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-11194.01.patch, HDFS-11194.02.patch, HDFS-11194.03.patch
> The metrics collected in HDFS-10917 should be reported to and aggregated on NameNode
as part of heart beat messages. This will make is easy to expose it through JMX to users who
are interested in them.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message