hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Manes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
Date Mon, 27 Mar 2017 16:31:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943573#comment-15943573

Ben Manes commented on HBASE-17339:

I think its really difficult to tell, but I'd guess that there might be a small gain.

Those 30M misses sound compulsory, meaning that they would occur regardless of the cache size.
Therefore we'd expect an unbounded cache to have 87% hit rate at 400M accesses or 90% at 300M.
If you're observing 80%, then at best there is 10% boost. If Bélády's optimal is lower then
there is even less of a difference to boost by. It could be that SLRU captures frequency well
enough that both policies are equivalent.

The [MultiQueue paper|https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou.pdf]
argues that 2nd level cache access patterns are frequency skewed. The LruBlockCache only retains
if there were multiple accesses, not the counts, and tries to evict fairly across the buckets.
Since TinyLFU captures a longer tail (freq. of items outside of the cache), there is a chance
that it can make a better prediction. But we wouldn't know without an access trace to simulate

I suspect that the high hit rate means there isn't much cache pollution to lower the hit rate,
so a good enough victim is chosen. At the tail most of the entries have a relatively similar
frequency, too. It would be fun to find out, but you probably won't think it was worth the

> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, HBASE-17339-V03.patch,
HBASE-17339-V03.patch, HBASE-17339-V04.patch, HBASE-17339-V05.patch, HBASE-17339-V06.patch,
> The current implementation of a get operation (to retrieve values for a specific key)
scans through all relevant stores of the region; for each store both memory components (memstores
segments) and disk components (hfiles) are scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only components first
and only if the result is incomplete scans both memory and disk.

This message was sent by Atlassian JIRA

View raw message