hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization
Date Tue, 20 Dec 2016 08:13:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763596#comment-15763596

Eshcar Hillel commented on HBASE-17339:

Thanks all for commenting.
Indeed this optimization would not work in the general case; this was briefly discussed in
However, we believe quit often this optimization can yield correct answer and therefore should
be applied. 
In this Jira we would like to come up with the use cases where the optimization can *not*
be applied, and the user should be advised not to apply it (for example when the application
is manipulating versions), and the complete set of conditions that when satisfied the optimization
can be applied.
Hopefully this way we can allow application benefit from reduced latency when the results
are known to be correct, as well as allow it bypass this optimization when it is impossible
to ensure their correctness.

@ted_yu: there are multiple options for setting the mixed workload. We wanted to balance between
the amount of data written in the experiment and the time it takes to run it. 95-5 was the
optimal point for this. We can try different numbers as well.
The full details of the experiments can be found in the report in HBASE-16417. I will make
a clean report for the current Jira which includes only the relevant sections. 

> Scan-Memory-First Optimization
> ------------------------------
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch
> The current implementation of a get operation (to retrieve values for a specific key)
scans through all relevant stores of the region; for each store both memory components (memstores
segments) and disk components (hfiles) are scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only components first
and only if the result is incomplete scans both memory and disk.

This message was sent by Atlassian JIRA

View raw message