hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: setTimeRange and setMaxVersions seem to be inefficient
Date Tue, 28 Aug 2012 00:59:55 GMT
Hi Lars:

Thanks for confirming the inefficiency of the implementation for this case. For my case, a
column can have more than 10K versions, I need a quick way to stop the scan from digging the
column once there is a match (ReturnCode.INCLUDE). It would be nice to have a ReturnCode that
can notify the framework to stop and go to next column once the number of versions specify
in setMaxVersions is met. 

For now, I guess I have to hack it in the custom filter (I.e. I keep the count myself)? If
you have a better way to achieve this, please share :)

Best Regards,

Jerry

Sent from my iPad (sorry for spelling mistakes)

On 2012-08-27, at 20:11, lars hofhansl <lhofhansl@yahoo.com> wrote:

> Currently filters are evaluated before we do version counting.
> 
> Here's a comment from ScanQueryMatcher.java:
>     /**
>      * Filters should be checked before checking column trackers. If we do
>      * otherwise, as was previously being done, ColumnTracker may increment its
>      * counter for even that KV which may be discarded later on by Filter. This
>      * would lead to incorrect results in certain cases.
>      */
> 
> 
> So this is by design. (Doesn't mean it's correct or desirable, though.)
> 
> -- Lars
> 
> 
> ----- Original Message -----
> From: Jerry Lam <chilinglam@gmail.com>
> To: user <user@hbase.apache.org>
> Cc: 
> Sent: Monday, August 27, 2012 2:40 PM
> Subject: setTimeRange and setMaxVersions seem to be inefficient
> 
> Hi HBase community:
> 
> I tried to use setTimeRange and setMaxVersions to limit the number of KVs
> return per column. The behaviour is as I would expect that is
> setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of KV
> with timestamp that is less than or equal to T.
> However, I noticed that all versions of the KeyValue for a particular
> column are processed through a custom filter I implemented even though I
> specify setMaxVersions(1) and setTimeRange(0, T+1). I expected that if ONE
> KV of a particular column has ReturnCode.INCLUDE, the framework will jump
> to the next COL instead of iterating through all versions of the column.
> 
> Can someone confirm me if this is the expected behaviour (iterating through
> all versions of a column before setMaxVersions take effect)? If this is an
> expected behaviour, what is your recommendation to speed this up?
> 
> Best Regards,
> 
> Jerry
> 

Mime
View raw message