hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Scan vs map-reduce
Date Mon, 14 Apr 2014 19:39:06 GMT

re:  "my first version is using 20,000 Get²

Just throwing this out there, but have you looked at multi-get?  Multi-get
will group the gets by RegionServer internally.

You are doing a lot of IO for a web-app so this is going to be tough to
make ³fast², but there are ways to make it ³faster.²

But since you only have 1,000,000 rows you might not have many regions, so
this might wind up all going on the same RegionServer.

On 4/14/14, 7:52 AM, "Li Li" <fancyerii@gmail.com> wrote:

>I need to get about 20,000 rows from the table. the table is about
>1,000,000 rows.
>my first version is using 20,000 Get and I found it's very slow. So I
>modified it to a scan and filter unrelated rows in the client.
>maybe I should write a coprocessor. btw, is there any filter available
>for me? something like sql statement where rowkey in('abc', 'abd'
>....). a very long in statement
>On Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari
><jean-marc@spaggiari.org> wrote:
>> Hi Li Li,
>> If you have more than one region, might be useful. MR will scan all the
>> regions in parallel. If you do a full scan from a client API with no
>> parallelism, then the MR job might be faster. But it will take more
>> resources on the cluster and might impact the SLA of the other clients,
>> any,
>> JM
>> 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <dontariq@gmail.com>:
>>> Well, it depends. Could you please provide some more details?It will
>>> us in giving a proper answer.
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <fancyerii@gmail.com> wrote:
>>> > I have a full table scan which cost about 10 minutes. it seems a
>>> > bottleneck for our application. if use map-reduce to rewrite it. will
>>> > it be faster?
>>> >

View raw message