hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
Date Thu, 27 Jun 2013 00:49:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694377#comment-13694377

Elliott Clark commented on HBASE-8370:

bq.Having a cache hit ratio of 80 % means that at least 80 % of my requests are fast
I would disagree. 

* Full handlers
* Giant gets of large amounts of data.
* Gets without a proper bloom filter.
* Things that skip past lots of (cached) blocks
* Slow data block encoding
* slow filters
* slow network
* lock contention
* GC

There are TONS of other reason that your requests can be slow.  And without knowing the work
load you can't tell if cache miss is more or less likely than any other explanation.  I've
seen workloads where the cache percent was in the low teens and I've seen workloads where
the cache percent was really 100%.  There's no way a priori to know if a number is good or
bad.  So you again are back to using the metrics with a base line and comparing them.  For
that the absolute numbers are less important.

bq.As far as derivatives go, Miss count derivative can go up with other things like read request
Yep and that makes things harder but the only thing that's not susceptible are gauges.  And
like I said before I'm trying to move us off of gauges.

bq.I dont know the number of cache misses for Index block vs Data block vs Bloom block. I
would no longer know how many Data blocks are being accessed and how many Index blocks etc
But those aren't actionable metrics.  

* If your bloom block cache hit count goes down you can do....... Not much. Not worth counting
if you can't take action on it.
* With the way the index blocks works you can't cache miss them, after the first time, unless
we're oom (they aren't ever evicted, even if you turn off caching the cf).  So you'll see
that there are some misses on region open, and anytime there's a new flush or compaction.
So it will be 100%.  Compaction and flush metrics are much more useful here for determining
this kind of thing, so there's no need to add more metrics for something that's better covered
somewhere else.
* So data blocks are the only useful one.  and they dominate the number of blocks requested.
So this can pretty well be covered by the following.
** blockCacheExpressHitPercent
** blockCountHitPercent
** blockCacheHitCount
** blockCacheMissCount

I'm -1 adding any more metrics on the read path unless there's something that's totally missed
(Jeremy brought up a couple the last time I met with him).  That code is just too important
to be instrumented any more for things that can be figured out other ways (and I would argue
better ways but that's less important).

I'm +1 on making that cache hit percent a double so there's more accuracy.
> Report data block cache hit rates apart from aggregate cache hit rates
> ----------------------------------------------------------------------
>                 Key: HBASE-8370
>                 URL: https://issues.apache.org/jira/browse/HBASE-8370
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Varun Sharma
>            Assignee: Varun Sharma
>            Priority: Minor
> Attaching from mail to dev@hbase.apache.org
> I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows,
can get me a break down by data blocks. I always see this number to be very high and that
could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks
in the block cache before retrieving the data block. This could be artificially bloating up
the cache hit ratio.
> Assuming the above is correct, do we already have a cache hit ratio for data blocks alone
which is more obscure ? If not, my sense is that it would be pretty valuable to add one.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message