hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaneal Manek (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5533) Add more metrics to HBase
Date Thu, 22 Mar 2012 23:18:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236158#comment-13236158

Shaneal Manek commented on HBASE-5533:

AFAIK, it's simply a bug to ever use System.currentTimeMillis() for timing things. Its output
will jump around wildly with NTP updates, leap seconds, daylight savings time, timezone changes,
and many other reasons (see: https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks).
I've been bitten by DST causing a ~10ms op to look like it took ~1hr. Which spiked our mean
time high enough to set off an alerting system (another reason I like 'median' so much better
than 'mean' too ;-)). System.nanotime() doesn't have these problems.

There is definitely overlap between the new and old metrics (the old ones are, in fact, an
exact subset of the new ones, since the new 'histogram' keeps track of the mean too). I didn't
want to remove the old ones because it would have broken backwards compatibility with existing
monitoring/alerting tools. Is there a method for marking metrics deprecated?

I'm in progress on porting my patch to TRUNK, and consolidating measurements for both sets
of metrics. I'll be uploading it later tonight - but just wanted to make a status update.
> Add more metrics to HBase
> -------------------------
>                 Key: HBASE-5533
>                 URL: https://issues.apache.org/jira/browse/HBASE-5533
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.2, 0.94.0
>            Reporter: Shaneal Manek
>            Assignee: Shaneal Manek
>            Priority: Minor
>         Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, TimingOverhead.java,
hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch,
> To debug/monitor production clusters, there are some more metrics I wish I had available.
> In particular:
> - Although the average FS latencies are useful, a 'histogram' of recent latencies (90%
of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful
> - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful
> - Counting the number of accesses to each region to detect hotspotting
> - Exposing the current number of HLog files

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message