hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: setTimeRange and setMaxVersions seem to be inefficient
Date Tue, 28 Aug 2012 00:11:30 GMT
Currently filters are evaluated before we do version counting.

Here's a comment from ScanQueryMatcher.java:
     * Filters should be checked before checking column trackers. If we do
     * otherwise, as was previously being done, ColumnTracker may increment its
     * counter for even that KV which may be discarded later on by Filter. This
     * would lead to incorrect results in certain cases.

So this is by design. (Doesn't mean it's correct or desirable, though.)

-- Lars

----- Original Message -----
From: Jerry Lam <chilinglam@gmail.com>
To: user <user@hbase.apache.org>
Sent: Monday, August 27, 2012 2:40 PM
Subject: setTimeRange and setMaxVersions seem to be inefficient

Hi HBase community:

I tried to use setTimeRange and setMaxVersions to limit the number of KVs
return per column. The behaviour is as I would expect that is
setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of KV
with timestamp that is less than or equal to T.
However, I noticed that all versions of the KeyValue for a particular
column are processed through a custom filter I implemented even though I
specify setMaxVersions(1) and setTimeRange(0, T+1). I expected that if ONE
KV of a particular column has ReturnCode.INCLUDE, the framework will jump
to the next COL instead of iterating through all versions of the column.

Can someone confirm me if this is the expected behaviour (iterating through
all versions of a column before setMaxVersions take effect)? If this is an
expected behaviour, what is your recommendation to speed this up?

Best Regards,


View raw message