hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@duboce.net
Subject Re: Review Request: Inexpensive reseek operations (1517) and filter based scanning (2904)
Date Thu, 12 Aug 2010 13:48:50 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/574/#review872
-----------------------------------------------------------


@Pranav: Ryan is reviewing your v3.  He knows hfile best.  Should be up soon.

- stack


On 2010-08-10 17:58:43, Pranav Khaitan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/574/
> -----------------------------------------------------------
> 
> (Updated 2010-08-10 17:58:43)
> 
> 
> Review request for hbase, stack, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and
Kannan Muthukkaruppan.
> 
> 
> Summary
> -------
> 
> What this patch includes:
> 1. Reseek framework. The ability to reseek to any position after having seeked to some
point in the file. To add this utility, changes were required in all scanners.
> 2. The option for any filter to be able to tell the scanner which  key it wants to go
to next. Filters can be easily customized for different use-cases without affecting the main
read path. Since filters are optional, they do not add any overhead for users who do not take
advantage of it.
> 3. ColumnPrefixFilter: This filter serves the purpose of selecting keys with columns
having a specified prefix. The filter takes advantage of theability to pass keys to the scanner
to tell which key it should seek to next.
> 4. This also gives the option to seek directly to the required columns using reseek mechanism
(HBASE-2450). However, it needs to be decided if that feature should be made optional using
a filter or should it be added to the read path to be used by everyone. Did not include it
in this patch since it required further discussions and testing.
> 5. Small changes to ScanQueryMatcher to return more specific return codes.
> 
> For HFile and reseek, the modifications were done after discussions with Ryan and he
had also written some code for this patch. For ScanQueryMatcher and Filters, discussions were
held with Jonathan, Karthik and Kannan.
> 
> This is big as it touches 21 files. It is important to closely review the reseek functions
in HFile, StoreFileScanner, KeyValueHeap and HalfStoreFileReader as these functions are slightly
tricky and probably going to be used in a lot of improvements in future.
> 
> 
> This addresses bugs HBASE-1517, HBASE-2903 and HBASE-2904.
>     http://issues.apache.org/jira/browse/HBASE-1517
>     http://issues.apache.org/jira/browse/HBASE-2903
>     http://issues.apache.org/jira/browse/HBASE-2904
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java PRE-CREATION

>   trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java
983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 983321 
>   trunk/src/test/java/org/apache/hadoop/hbase/filter/TestColumnPrefixFilter.java PRE-CREATION

>   trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java PRE-CREATION

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 983321

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 983321

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java 983321

> 
> Diff: http://review.cloudera.org/r/574/diff
> 
> 
> Testing
> -------
> 
> Added tests at HFileScanner and Filter/RegionScanner levels. The time taken for running
these tests is very less. All existing tests pass successfully. Performance benchmarking was
done and significant gains in performance can be seen for corresponding use-cases.
> 
> 
> Thanks,
> 
> Pranav
> 
>


Mime
View raw message