hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
Date Thu, 26 Jan 2017 09:28:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839458#comment-15839458

Eshcar Hillel commented on HBASE-17339:

The attached patch is not complete and not properly tested and so may have some bugs (but
it is compiling :) ).
I'm posting it to get feedback on the core logic.
The main property needed for this optimization is monotonicity. A store preserves *monotonicity*
if all timestamps in its memstore are strictly greater than all timestamps in its store files.

The algorithm is as follows
0. decide if we should apply optimization: (1) flag is on (2) get operation over a specific
set of columns
if decided to apply optimization then
 1. open all relevant *memory* scanners; 
     while opening scanners collect max flushed timestamps in all stores (first collect);

     a null timestamp indicates the store does not maintain monotonicity
 2. if all stores are monotonic then 
	2.1 get results
	2.2 validate monotonicity: validate max flushed timestamps have not changed in all stores

           (double-collect ensures results are taken from a consistent view) 
if decided not to apply optimization 
   *OR* stores are not monotonic 
   *OR* decided to apply optimization but results do not satisfy get operation (not enough
versions per column) 
 3. open all scanners
 4. get results

Missing parts (TODOs)
- properly init maxFlushedTimestamp (in AbstractMemStore)  when recovering -- need to traverse
all existing store files
- make memoryScanOptimization a table property instead of global property; set to true by
- (Optional) add a flag in Get operation which indicates if the user wants to apply the optimization
(per each operation!); set to true by default
- (Optional) check if we can change the implementation of getScanners in XXXMemstore to return
multiple scanners so we can later filter out each one of them and not either keep all or eliminate
all. Currently the implementation (both in default and compacting) returns a singleton list
with one MemStoreScanner which comprises one to few segment scanners.

> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch
> The current implementation of a get operation (to retrieve values for a specific key)
scans through all relevant stores of the region; for each store both memory components (memstores
segments) and disk components (hfiles) are scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only components first
and only if the result is incomplete scans both memory and disk.

This message was sent by Atlassian JIRA

View raw message