hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10872) Add MutableRate metrics for FSNamesystemLock operations
Date Mon, 17 Oct 2016 23:13:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583802#comment-15583802
] 

Zhe Zhang commented on HDFS-10872:
----------------------------------

Thanks Erik for the updated patch. I think we are pretty close.

About the main change in {{FSNamesystemLock}}
# If a thread lives shorter than the configured {{metricAggregationInterval}}, are we gonna
lose its locking metrics? In reality it's probably not a big concern since RPC handler threads
are reused. But putting a comment here for more thoughts.
# Selecting a _real_ default aggregation interval is not easy. Maybe we should document it
in {{hdfs-default.xml}}. Alternatively we can have 2 config knobs, one binary and one integer.
# Agreed that the overhead with the current implementation is pretty small (say with 10 sec
interval). As a follow-on optimizaiton (depending on experience from production deployment)
maybe we can consider a combination of hard-deadline and opportunistic try-and-backoff. E.g.
at a higher frequency than {{metricAggregationInterval}} we can try the lock for {{opHoldtimeMetrics}};
if lock is free, dump the metrics, otherwise try lock in some more time.

A few minors about naming:
# The below mapping merges the {{yield()}} locking time into the {{contentSummary}} category.
It looks a reasonable approximation to me. But more opinions would be helpful.
{code}
@@ -115,7 +115,7 @@ public boolean yield() {
     // unlock
     dir.readUnlock();
-    fsn.readUnlock();
+    fsn.readUnlock("contentSummary");
{code}
# {{getBlockLocations}} currently appears as {{open}} in audit logging. It could be an existing
bug. But since it has been audit logged that way it doesn't make sense to change here. I'll
create a separate JIRA to discuss.
# {{writeUnlock("clearCorruptLazyPersistFile");}} should be "clearCorruptLazyPersistFiles"
# Maybe the {{checkLease}} category should be {{leasesMonitor}}? There's a one-off {{checkLease()}}
method in FSN
# The one in {{NamenodeFsck#getBlockLocations}}, maybe we should use "fsckGetBlockLocations"
to differentiate from regular {{getBlockLocations}} RPC call?

> Add MutableRate metrics for FSNamesystemLock operations
> -------------------------------------------------------
>
>                 Key: HDFS-10872
>                 URL: https://issues.apache.org/jira/browse/HDFS-10872
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>         Attachments: FSLockPerf.java, HDFS-10872.000.patch, HDFS-10872.001.patch, HDFS-10872.002.patch
>
>
> Add metrics for FSNamesystemLock operations to see, overall, how long each operation
is holding the lock for. Use MutableRate metrics for now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message