hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran M Yousuf <imyou...@gmail.com>
Subject Re: Best way to get multiple non-sequential rows
Date Wed, 25 Aug 2010 01:16:33 GMT
Thanks for the suggestions Michael.

On Tue, Aug 24, 2010 at 5:37 PM, Michael Segel
<michael_segel@hotmail.com> wrote:
>
> Hi,
>
> Non sequential rows?
>
> Short answer... it depends.  :-)
>
> Longer answer... how 'non-sequential' ?
>
> If you're using a key that is hashed (SHA-1)  then your rows will be fairly random and
'non-sequential.
> Here you're best bet is to fetch each row via a get().  In order to do the get you have
to know the specific key so the fetch should be fairly quick and consistent regardless of
the size of the database. (near linear scalability). This works great if you know your key.
>
> If you're using some key that isn't hashed but the rows aren't sequential, you may want
to do a range scan and then drop
> the rows that are not needed. This may be faster in some specific situations where all
of your data is within one or two regions of a large, large table.
> (But its so specific, I don't know of the value in terms of a generic query.)
>
> An extreme and bad example... suppose you want to find all of the shops along a specific
street and in part of the key you include the street side but is also based on the address.
> If you did a scan, you'd end up with a list where you may want every other entry.  So
here it would be faster to do a sequential scan with a partial key to put a boundary on which
regions to scan.  (Again this is a bad example.)
> If you also write your own custom filter, you can get it to return only the rows you
want.
>
> Again, I apologize for the bad example... it was the first thing I could think of before
I finished my first cup of coffee in the morning.
>
> HTH
>
> -Mike
>
>
>> Date: Tue, 24 Aug 2010 09:35:26 +0600
>> Subject: Best way to get multiple non-sequential rows
>> From: imyousuf@gmail.com
>> To: user@hbase.apache.org
>>
>> Hi,
>>
>> I am using the HBase client API to interact with HBase. I have noticed
>> that HTableInterface has operations such as put(List<Put>),
>> delete(List<Delete>), but there is no similar method for Get. Using
>> scan it is possible to load a range of rows, i.e. sequential rows. My
>> question is -
>> how would it be most efficient to load N non-sequential rows?
>>
>> Currently I am using get(Get) method N times.
>>
>> --
>> Imran M Yousuf
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>



-- 
Imran M Yousuf
Entrepreneur & CEO
Smart IT Engineering Ltd.
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Mime
View raw message