hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seraph Imalia <ser...@eisp.co.za>
Subject Re: Improving HBase scanner
Date Wed, 05 May 2010 07:55:36 GMT
Hi Ryan,

Thanks for your response - I am also working on this project.

I was hoping that hBase perhaps treated the time range differently  
which would prevent a full table scan.  I suppose our only next option  
is to implement indexing?

Regards,
Seraph


On 05 May 2010, at 9:27 AM, Ryan Rawson wrote:

> You have to examine nearly every single value in the table - the
> mechanism by which HBase can restrict how much data it has to scan is
> via the row key only.  All the filters and filter-like calls (eg:
> setTimeRange) just restrict what data is passed back to the client.
>
> So yes you are scanning the entire table.  Could get expensive once
> you have a few TB.
>
> The thing to remember is access to data is all about the primary key.
> It's very similar to a RDBMs with only a primary index.  If you can't
> restrict your query via the primary key, then you have to do a full
> table scan.
>
> -ryan
>
> On Wed, May 5, 2010 at 12:22 AM, Michelan Arendse <michelan@addynamo.com 
> > wrote:
>> I don't know what the row start and end keys are - they GUID keys  
>> (improves writes across cluster - had help with this from this user- 
>> group before).
>> I need to export data written between "startDate" and "endDate"  
>> into a relational database so I can interrogate the data (SUM/AVG,  
>> etc).
>>
>> That is why I am are using: scan.setTimeRange(fromDate.getTime(),  
>> toDate.getTime());
>> In my test with live data, I only took between 2010-03-26 00:00:00  
>> and 2010-03-26 01:00:00 - there should only be a few thousand rows  
>> in-between those dates.
>>
>> Will hbase still take forever to find the data I look for unless I  
>> use startRow/endRow?
>>
>> -----Original Message-----
>> From: TuX RaceR [mailto:tuxracer69@gmail.com]
>> Sent: 04 May 2010 05:52 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Improving HBase scanner
>>
>> Michelan Arendse wrote:
>>> Is there a way to speed up the fetching of data from HBase?
>>>
>>>
>>
>> Divide your key space in smaller chunks?
>> using closer |startRow, and ||stopRow?|
>> |*cf:
>> <http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/Scan.html#Scan%28byte%5B%5D,%20byte%5B%5D%29

>> >
>>
>> Scan
>> <http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/Scan.html#Scan%28byte%5B%5D,%20byte%5B%5D%29

>> >*(byte[] startRow,
>> byte[] stopRow)|
>>
>>
>> TuX
>>



Mime
View raw message