hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@duboce.net
Subject Re: Review Request: Inexpensive reseek operations (1517) and filter based scanning (2904)
Date Tue, 10 Aug 2010 04:04:11 GMT

This is an automatically generated e-mail. To reply, visit:


    I understand why you did this new filter Pranav, and its the right way to introduce the
notion, but at same time I can't help thinking that we should just eat it and make this functionality
part of the read path.   How about we get this patch in as is and then come back to integrate
more directly in a new issue?


    Can you add better explaination of what this does?  This is the only documentation on
this filter.  Provide an example in the javadoc?


    This is SEEK_NEXT_KEY_USING_FILTER?  Would suggest you not shorten the key name.


    This will never cross rows, will it?
    If so, filters can't carry state across rows (Ask me why if you need a why on this).


    .. that, by default, returns a null....


    Does FilterList inherit from FilterBase?  If so, is this nextKey implementation needed?


    I'm not sure I follow... a bit more clarity in the comment would help (This looks like


    How we know there are keys after this one?


    So, we seeking to the next column on this row or on the next?  If one current row, isn't
it rare that the next key will NOT be the next column?

- stack

On 2010-08-08 17:01:04, Pranav Khaitan wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/574/
> -----------------------------------------------------------
> (Updated 2010-08-08 17:01:04)
> Review request for hbase, stack, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and
Kannan Muthukkaruppan.
> Summary
> -------
> What this patch includes:
> 1. Reseek framework. The ability to reseek to any position after having seeked to some
point in the file. To add this utility, changes were required in all scanners.
> 2. The option for any filter to be able to tell the scanner which  key it wants to go
to next. Filters can be easily customized for different use-cases without affecting the main
read path. Since filters are optional, they do not add any overhead for users who do not take
advantage of it.
> 3. ColumnPrefixFilter: This filter serves the purpose of selecting keys with columns
having a specified prefix. The filter takes advantage of theability to pass keys to the scanner
to tell which key it should seek to next.
> 4. This also gives the option to seek directly to the required columns using reseek mechanism
(HBASE-2450). However, it needs to be decided if that feature should be made optional using
a filter or should it be added to the read path to be used by everyone. Did not include it
in this patch since it required further discussions and testing.
> 5. Small changes to ScanQueryMatcher to return more specific return codes.
> For HFile and reseek, the modifications were done after discussions with Ryan and he
had also written some code for this patch. For ScanQueryMatcher and Filters, discussions were
held with Jonathan, Karthik and Kannan.
> This is big as it touches 21 files. Some issues, particularly those related to HFiles
and reseek implementations, are tricky and require closer reviews.
> Modifications which couldn't be done within the scope of this work and should be considered
in future:
> 1. Ability to seek directly to next row. Currently, since we do not know what the next
row is, we also cannot pass a key to the scanner to go to that position. 
> 2. Refactoring ScanQueryMatcher: The match function has become significantly big and
can be broken down into simpler components.
> This addresses bugs HBASE-1517, HBASE-2903 and HBASE-2904.
>     http://issues.apache.org/jira/browse/HBASE-1517
>     http://issues.apache.org/jira/browse/HBASE-2903
>     http://issues.apache.org/jira/browse/HBASE-2904
> Diffs
> -----
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java PRE-CREATION

>   trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 983321

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 983321 
>   trunk/src/test/java/org/apache/hadoop/hbase/filter/TestColumnPrefixFilter.java PRE-CREATION

>   trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java PRE-CREATION

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 983321

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 983321

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java 983321

> Diff: http://review.cloudera.org/r/574/diff
> Testing
> -------
> Added tests at HFileScanner and Filter/RegionScanner levels. The time taken for running
these tests is very less. All existing tests pass successfully. Performance benchmarking was
done and significant gains in performance can be seen for corresponding use-cases.
> Thanks,
> Pranav

View raw message