hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Improving HBase scanner
Date Wed, 05 May 2010 07:27:10 GMT
You have to examine nearly every single value in the table - the
mechanism by which HBase can restrict how much data it has to scan is
via the row key only.  All the filters and filter-like calls (eg:
setTimeRange) just restrict what data is passed back to the client.

So yes you are scanning the entire table.  Could get expensive once
you have a few TB.

The thing to remember is access to data is all about the primary key.
It's very similar to a RDBMs with only a primary index.  If you can't
restrict your query via the primary key, then you have to do a full
table scan.


On Wed, May 5, 2010 at 12:22 AM, Michelan Arendse <michelan@addynamo.com> wrote:
> I don't know what the row start and end keys are - they GUID keys (improves writes across
cluster - had help with this from this user-group before).
> I need to export data written between "startDate" and "endDate" into a relational database
so I can interrogate the data (SUM/AVG, etc).
> That is why I am are using: scan.setTimeRange(fromDate.getTime(), toDate.getTime());
> In my test with live data, I only took between 2010-03-26 00:00:00 and 2010-03-26 01:00:00
- there should only be a few thousand rows in-between those dates.
> Will hbase still take forever to find the data I look for unless I use startRow/endRow?
> -----Original Message-----
> From: TuX RaceR [mailto:tuxracer69@gmail.com]
> Sent: 04 May 2010 05:52 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Improving HBase scanner
> Michelan Arendse wrote:
>> Is there a way to speed up the fetching of data from HBase?
> Divide your key space in smaller chunks?
> using closer |startRow, and ||stopRow?|
> |*cf:
> <http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/Scan.html#Scan%28byte%5B%5D,%20byte%5B%5D%29>
> Scan
> <http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/Scan.html#Scan%28byte%5B%5D,%20byte%5B%5D%29>*(byte[]
> byte[] stopRow)|
> TuX

View raw message