hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Rawson" <ryano...@gmail.com>
Subject Re: Review Request: Inexpensive reseek operations (1517) and filter based scanning (2904)
Date Thu, 12 Aug 2010 18:54:26 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/574/#review849
-----------------------------------------------------------



trunk/src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java
<http://review.cloudera.org/r/574/#comment2850>

    it seems like we should have KeyValue.firstKeyOnColumn() to cover this, to prevent all
the logic like Type.Maximum and HConstants.LATEST_TIMESTAMP from leaking everywhere



trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
<http://review.cloudera.org/r/574/#comment2903>

    should we also rename this to SEEK_NEXT_USING_HINT  as well?



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
<http://review.cloudera.org/r/574/#comment2900>

    braces go up 1 line { 
    like so;
    }



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<http://review.cloudera.org/r/574/#comment2902>

    lets rename this to SEEK_NEXT_USING_HINT



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
<http://review.cloudera.org/r/574/#comment2901>

    maybe we should call this SEEK_NEXT_USING_HINT because its more general?
    
    


- Ryan


On 2010-08-10 17:58:43, Pranav Khaitan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/574/
> -----------------------------------------------------------
> 
> (Updated 2010-08-10 17:58:43)
> 
> 
> Review request for hbase, stack, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and
Kannan Muthukkaruppan.
> 
> 
> Summary
> -------
> 
> What this patch includes:
> 1. Reseek framework. The ability to reseek to any position after having seeked to some
point in the file. To add this utility, changes were required in all scanners.
> 2. The option for any filter to be able to tell the scanner which  key it wants to go
to next. Filters can be easily customized for different use-cases without affecting the main
read path. Since filters are optional, they do not add any overhead for users who do not take
advantage of it.
> 3. ColumnPrefixFilter: This filter serves the purpose of selecting keys with columns
having a specified prefix. The filter takes advantage of theability to pass keys to the scanner
to tell which key it should seek to next.
> 4. This also gives the option to seek directly to the required columns using reseek mechanism
(HBASE-2450). However, it needs to be decided if that feature should be made optional using
a filter or should it be added to the read path to be used by everyone. Did not include it
in this patch since it required further discussions and testing.
> 5. Small changes to ScanQueryMatcher to return more specific return codes.
> 
> For HFile and reseek, the modifications were done after discussions with Ryan and he
had also written some code for this patch. For ScanQueryMatcher and Filters, discussions were
held with Jonathan, Karthik and Kannan.
> 
> This is big as it touches 21 files. It is important to closely review the reseek functions
in HFile, StoreFileScanner, KeyValueHeap and HalfStoreFileReader as these functions are slightly
tricky and probably going to be used in a lot of improvements in future.
> 
> 
> This addresses bugs HBASE-1517, HBASE-2903 and HBASE-2904.
>     http://issues.apache.org/jira/browse/HBASE-1517
>     http://issues.apache.org/jira/browse/HBASE-2903
>     http://issues.apache.org/jira/browse/HBASE-2904
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java PRE-CREATION

>   trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java
983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 983321 
>   trunk/src/test/java/org/apache/hadoop/hbase/filter/TestColumnPrefixFilter.java PRE-CREATION

>   trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java PRE-CREATION

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 983321

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 983321

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java 983321

> 
> Diff: http://review.cloudera.org/r/574/diff
> 
> 
> Testing
> -------
> 
> Added tests at HFileScanner and Filter/RegionScanner levels. The time taken for running
these tests is very less. All existing tests pass successfully. Performance benchmarking was
done and significant gains in performance can be seen for corresponding use-cases.
> 
> 
> Thanks,
> 
> Pranav
> 
>


Mime
View raw message