hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10872) Add MutableRate metrics for FSNamesystemLock operations
Date Mon, 17 Oct 2016 23:13:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583802#comment-15583802

Zhe Zhang commented on HDFS-10872:

Thanks Erik for the updated patch. I think we are pretty close.

About the main change in {{FSNamesystemLock}}
# If a thread lives shorter than the configured {{metricAggregationInterval}}, are we gonna
lose its locking metrics? In reality it's probably not a big concern since RPC handler threads
are reused. But putting a comment here for more thoughts.
# Selecting a _real_ default aggregation interval is not easy. Maybe we should document it
in {{hdfs-default.xml}}. Alternatively we can have 2 config knobs, one binary and one integer.
# Agreed that the overhead with the current implementation is pretty small (say with 10 sec
interval). As a follow-on optimizaiton (depending on experience from production deployment)
maybe we can consider a combination of hard-deadline and opportunistic try-and-backoff. E.g.
at a higher frequency than {{metricAggregationInterval}} we can try the lock for {{opHoldtimeMetrics}};
if lock is free, dump the metrics, otherwise try lock in some more time.

A few minors about naming:
# The below mapping merges the {{yield()}} locking time into the {{contentSummary}} category.
It looks a reasonable approximation to me. But more opinions would be helpful.
@@ -115,7 +115,7 @@ public boolean yield() {
     // unlock
-    fsn.readUnlock();
+    fsn.readUnlock("contentSummary");
# {{getBlockLocations}} currently appears as {{open}} in audit logging. It could be an existing
bug. But since it has been audit logged that way it doesn't make sense to change here. I'll
create a separate JIRA to discuss.
# {{writeUnlock("clearCorruptLazyPersistFile");}} should be "clearCorruptLazyPersistFiles"
# Maybe the {{checkLease}} category should be {{leasesMonitor}}? There's a one-off {{checkLease()}}
method in FSN
# The one in {{NamenodeFsck#getBlockLocations}}, maybe we should use "fsckGetBlockLocations"
to differentiate from regular {{getBlockLocations}} RPC call?

> Add MutableRate metrics for FSNamesystemLock operations
> -------------------------------------------------------
>                 Key: HDFS-10872
>                 URL: https://issues.apache.org/jira/browse/HDFS-10872
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>         Attachments: FSLockPerf.java, HDFS-10872.000.patch, HDFS-10872.001.patch, HDFS-10872.002.patch
> Add metrics for FSNamesystemLock operations to see, overall, how long each operation
is holding the lock for. Use MutableRate metrics for now. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message