hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10475) Adding metrics for long FSNamesystem read and write locks
Date Mon, 12 Sep 2016 23:43:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485649#comment-15485649

Erik Krogen commented on HDFS-10475:

To get a mapping of operation -> lock time metrics we propose the following:
1. Move the logging/metrics logic into FSNamesystemLock rather than FSNamesystem to centralize
logic and tracking. 
2. Add new methods, {{(read|write)Unlock(operation)}}, in which you specify a name for the
current operation as you unlock (note that for metrics collecting the name is only needed
on unlock). If an operation is not specified, a catch-all 'default' or 'other' operation would
be used. We would manually add the name of the operation to the unlock call for those operations
which we think are likely to contribute significantly to the overall lock hold time. This
is a manual process since otherwise we would need to get a stack trace (to find the method
name) on each call to {{unlock}} which may be prohibitively expensive.
3. FSNamesystemLock contains a map of OperationName -> MutableRate metrics, all of which
are also contained within a MetricsRegistry. On each time a lock is released we look up the
corresponding MutableRate and add a value for the lock hold time. We do not use the map within
MetricsRegistry because it is synchronized and we do not want contention on this map to cause
slowness around the FSNamesystem lock. 

The best type of map to use within FSNamesystemLock to hold the MutableRate metrics is tricky.
Ideally we would use a Java 8 ConcurrentHashMap, using {{computeIfAbsent}} to create new MutableRate
metrics objects and insert them into the registry whenever a new operation is encountered.
However this functionality is not available in Java 7 and we would like to support older versions.
Thus we propose using a regular HashMap (wrapped within a call to {{Collections.unmodifiableMap}})
which is initialized with all of the different operations at the time the FSNamesystemLock
is created. This allows for lock-free access, but requires that we have a list of all the
possible operations. So we suggest an Enum, e.g. FSNamesystemLockMetricOp, which lists all
of the operations of interest to be supplied to the {{(read|write)Unlock}} calls. This would
likely be a list of a few dozen operations of interest which are likely to be relatively expensive
lock holders. Operations not listed within this Enum would be regarded as "other"/"default".

We believe this is the right tradeoff between granularity of metrics, performance, and developer
effort, but it is certainly not ideal in terms of manual effort required. We would be interested
to hear any other ideas about how to make the metrics collection require less manual intervention.

> Adding metrics for long FSNamesystem read and write locks
> ---------------------------------------------------------
>                 Key: HDFS-10475
>                 URL: https://issues.apache.org/jira/browse/HDFS-10475
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Xiaoyu Yao
>            Assignee: Erik Krogen
> This is a follow up of the comment on HADOOP-12916 and [here|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15310837&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15310837]
add more metrics and WARN/DEBUG logs for long FSD/FSN locking operations on namenode similar
to what we have for slow write/network WARN/metrics on datanode.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message