hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Bortnikov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
Date Sun, 04 Jun 2017 09:12:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036219#comment-16036219

Edward Bortnikov commented on HBASE-17339:

Thanks [~eshcar]. Maybe it makes sense to describe the experiment we used to figure out the
current implementation, to provide the community with the full picture (smile). 

We looked at a workload with temporal (rather than spatial) locality, namely writes closely
followed by reads. This pattern is quite frequent in pub-sub scenarios. Instead of seeing
a performance benefit in reading from MemStore first, we saw nearly 100% cache hit rate, and
could not explain it for a while. The lazy evaluation procedure described by [~eshcar] sheds
the light. 

Obviously, explicitly prioritizing reading from MemStore first rather than simply deferring
the data fetch from disk could help avoid some access to Bloom filters, just to figure out
whether the key has earlier versions on disk. Those accesses could be avoided. The main practical
impact is when the BF itself is not in memory, and accessing it triggers I/O. Is that a realistic
scenario? We assume that normally, BF's are permanently cached for all HFile's managed by
the RS. 

Dear community - please speak up. Thanks. 

> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, HBASE-17339-V03.patch,
HBASE-17339-V03.patch, HBASE-17339-V04.patch, HBASE-17339-V05.patch, HBASE-17339-V06.patch,
> The current implementation of a get operation (to retrieve values for a specific key)
scans through all relevant stores of the region; for each store both memory components (memstores
segments) and disk components (hfiles) are scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only components first
and only if the result is incomplete scans both memory and disk.

This message was sent by Atlassian JIRA

View raw message