hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Haidinyak <phaidin...@local.com>
Subject RE: Scan (Start Row, End Row) vs Scan (Row)
Date Thu, 20 Jan 2011 16:41:07 GMT
Question, does HBase stop scanning after it hits the end row? I thought it does.

Thanks

-Pete

-----Original Message-----
From: Jonathan Gray [mailto:jgray@fb.com] 
Sent: Thursday, January 20, 2011 8:09 AM
To: user@hbase.apache.org
Subject: RE: Scan (Start Row, End Row) vs Scan (Row)

The best way to do this is as Friso describes, using the existing stopRow parameter in Scan.

There is another way to do it with startRow + a filter.  There is a PrefixFilter which could
be used here.  Looking at the code, it seems as though the PrefixFilter does an early out
and stops the scan once passed the prefix.

If not, you can wrap any filter in a WhileMatchFilter.  That wrapping filter will make it
so once the underlying filter fails once, all further things will fail and the scan will early
out.

JG

> -----Original Message-----
> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
> Sent: Thursday, January 20, 2011 12:45 AM
> To: <user@hbase.apache.org>
> Subject: Re: Scan (Start Row, End Row) vs Scan (Row)
> 
> Performing a scan with
> 
> start row = 20100809041500_abd
> end row = 20100809041500_abe
> 
> will give you just that. The end row is exclusive, so it will only return rows
> with VAR1 = abd. You need to compute the 'abe' yourself, though (which is
> basically taking 'abd' and increasing the right most byte by 1 unless it's at max
> byte value, then set it to 0 and increase the byte left to that by 1, etc.). There
> is no scan method that has 'starts with' semantics, AFAIK.
> 
> See here:
> http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/
> hbase/client/Scan.html#Scan(byte[],
> byte[])<http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/
> hadoop/hbase/client/Scan.html#Scan(byte%5B%5D,%20byte%5B%5D)>
> 
> 
> Friso
> 
> 
> 
> 
> On 20 jan 2011, at 09:22, Shuja Rehman wrote:
> 
> Hi
> Consider the following scenario.
> 
> Row Key  Format = DATETIME_VAR1_VAR2 (where var1 and var2 have fixed
> lengths)
> 
> and example data could be
> 
> 20100809041500_abc_xyz
> 20100809041500_abc_xyw
> 20100809041500_abc_xyc
> *20100809041500_abd_xyz*
> 20100809041500_abd_xyw
> 20100809041500_abf_xyz
> ...
> 
> Now if i want to get the rows which only have this row key
> 20100809041500_abd then is there anyway to achieve through scan without
> using filter because if i use filter scan(startrow, filter) where
> startrow="20100809041500_abd" then it will scan whole table from start key
> to end of table. i want to just scan that part of table which i require. So if
> there is any method like this
> 
> scan(row)  where row ="20100809041500_abd"  and it just return the
> following results
> 
> 20100809041500_abd_xyz
> 20100809041500_abd_xyw
> 
> Kindly let me know whether it is achievable or not?
> thnx
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>


Mime
View raw message