hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11194) Maintain aggregated peer performance metrics on NameNode
Date Fri, 06 Jan 2017 00:22:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arpit Agarwal updated HDFS-11194:
---------------------------------
    Attachment: HDFS-11194.04.patch

Thanks for the review [~xyao]. The v4 patch addresses your feedback. A few comments below:

bq. Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled is true.
We only instantiate it when dataNodePeerStatsEnabled is true. Let me know if I misunderstood
your comment.

bq. Line 1653-1660: NIT: can we tweak the code to avoid calling slowPeers.getSlowPeers() multiple
times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) with parameterized logging?
Fixed. Can't change the logging since DataNodeManager has not been converted to slf4j yet.

bq. Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode has registered?
That overload is private. Did you mean use nodeInfo.getIpcAddr(true)? I used IP addresses
everywhere since that's what the DataNodes report on their peers.

bq.  Great catch on some missing synchronized on rollOverAvgs.
The synchronization was fine since the callers always get the lock but it was triggering a
findbugs false positive.

bq. We can make this an interface to allow different aggregation methods (median, 90th percentile)
for outlier detection.
Thanks. We shouldn't need an interface here as the SlowNodeDetector is agnostic to which aggregation
is used.

bq. we need to guard the tracing with if (LOG.isTraceEnabled()) to avoid the implicit sorted.toString()
overhead.
I think the sorted.toString method will not be called unless the trace level is enabled. Here's
the implementation of Logger#trace from Log4jLoggerAdapter:
{code}
public void trace(String format, Object... arguments) {
  if (isTraceEnabled()) {
    FormattingTuple ft = MessageFormatter.arrayFormat(format, arguments);
    logger.log(FQCN, traceCapable ? Level.TRACE : Level.DEBUG, ft
        .getMessage(), ft.getThrowable());
  }
}
{code}

bq.  Line 60: is the 300_000 a typo or special usage of timeout rule?
That's [for readability|https://docs.oracle.com/javase/8/docs/technotes/guides/language/underscores-literals.html]
to help count zeros.

> Maintain aggregated peer performance metrics on NameNode
> --------------------------------------------------------
>
>                 Key: HDFS-11194
>                 URL: https://issues.apache.org/jira/browse/HDFS-11194
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Xiaobing Zhou
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-11194-03-04.delta, HDFS-11194.01.patch, HDFS-11194.02.patch,
HDFS-11194.03.patch, HDFS-11194.04.patch
>
>
> The metrics collected in HDFS-10917 should be reported to and aggregated on NameNode
as part of heart beat messages. This will make is easy to expose it through JMX to users who
are interested in them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message