hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
Date Sun, 26 Mar 2017 10:24:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942225#comment-15942225

Eshcar Hillel commented on HBASE-17339:

I am attaching results of an experiment with mixed workload, and also the most updated patch
if anyone else wants to run it own experiments.
For the lower percentiles the optimization gains 8-9% in read latency, for high percentiles
it ranges between -5% to +5%. 
The experiment ran 100M get operations. With no optimization this translates into 100M (full)
scans, ~400M cache accesses from which ~30M are misses.
With the optimization we have only 62M (full) scans (the rest scan only the memory for results),
and only ~300M cache accesses, but the same amount of misses ~30M. 
In other experiment I did I saw the hit ratio dropping from 90% with no optimization to 80%
with the optimization.
If we can reduce the amount of misses we can reduce the read latency also in the high percentiles.

Can we have a different caching policy that reduces misses when reading less from the cache?
Perhaps TinyLFU (HBASE-15560) can help here [~ben.manes]?

> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, HBASE-17339-V03.patch,
HBASE-17339-V03.patch, HBASE-17339-V04.patch, HBASE-17339-V05.patch, HBASE-17339-V06.patch,
> The current implementation of a get operation (to retrieve values for a specific key)
scans through all relevant stores of the region; for each store both memory components (memstores
segments) and disk components (hfiles) are scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only components first
and only if the result is incomplete scans both memory and disk.

This message was sent by Atlassian JIRA

View raw message