hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: How to implement efficient bulk query
Date Fri, 22 Jul 2011 14:47:45 GMT

That method internally organizes the gets by RS, so it's pretty efficient.
 I think processes the RS-groups serially in 0.90.x, and I thought I saw a
ticket about multi-threaded processing, but you'll have to check the code.

On 7/22/11 9:46 AM, "Nanheng Wu" <nanhengwu@gmail.com> wrote:

>  I have an use case for my data stored in HBase where I need to make
>a query for 20K-30K keys at once. I know that the HBase client API
>supports get operation with a list of "gets", so a naive
>implementation would probably just make one or more batch get calls.
>First of all I am wondering if I choose this implementation how should
>I choose the batch size? Can I put all the keys in a single batch?
>Secondly, is there a better implementation that's more efficient? For
>instance I can sort the keys first and split them into groups of a
>certain size, for each group do a scan using the first and last key of
>the group and filter out retuned rows that are not in the group (kinda
>like a merge join). Would the second implementation be faster that the
>first? Are there better ways to go about it? I am using HBase 0.20.6.

View raw message