hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Frey <jochen_f...@yahoo.com>
Subject Re: Fast retrieval of multiple rows with non-sequential keys
Date Mon, 05 Oct 2009 14:06:05 GMT
Thanks JG.

I'll check out JIRA and educate myself.

If I had my wish - I'd get the results streamed back to me, so that I  
can start work on the results while they're being retrieved.

:-)

J

On Oct 5, 2009, at 3:36 PM, Jonathan Gray wrote:

> This is being worked on.  Ideally, a solution would batch things by  
> region
> and then by regionserver, so that the total number of RPC calls  
> would at a
> maximum be the number of servers.
>
> Follow HBASE-1845 and related issues.
>
> You can use threads and add some parallelism of the multiple gets in  
> your
> application for now.
>
> JG
>
> On Mon, October 5, 2009 3:02 am, Jochen Frey wrote:
>> I want to use HBase as a BLOB store for a search engine application.
>> For that the objects will be stored in one HBase table (~ 1B rows).
>> Object size is typically between 1kB to 20kB.
>>
>>
>> I am concerned about my read pattern, where our typical read retrieve
>> between tens and thousands of rows in random order. Looking at the  
>> Java API
>> the only method to retrieve rows in random order is to issue multiple
>>
>> Result = HTable.get(Get)
>>
>>
>> requests sequentially (I assume a Scanner is not a good idea since  
>> the
>> rows are need are spread randomly across the table / regions / etc.).
>>
>> My concern is that with that pattern I have one rpc call per item,
>> which seems to be a lot of overhead, especially when I need to  
>> retrieve
>> 100s or 1,000s of rows.
>>
>>
>> Would it not be preferable to batch up requests so that all rows
>> requested would be grouped by region, and then send off in parallel  
>> to
>> regions for retrieval - that way there'd be fewer RPC calls, and they
>> could be executed in parallel, as well? As such an addition to the
>> interface could look something like
>>
>> List<Result> = HTable.get(List<Get>)
>>
>>
>> Am I making sense? Is there something that I am missing?
>>
>>
>> Thanks!
>> Jochen
>>
>>
>>
>>
>

---
m: jochen_frey@yahoo.com
p: +1.415.706.1341


Mime
View raw message