hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: Scan vs map-reduce
Date Mon, 14 Apr 2014 11:52:28 GMT
I need to get about 20,000 rows from the table. the table is about
1,000,000 rows.
my first version is using 20,000 Get and I found it's very slow. So I
modified it to a scan and filter unrelated rows in the client.
maybe I should write a coprocessor. btw, is there any filter available
for me? something like sql statement where rowkey in('abc', 'abd'
....). a very long in statement

On Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari
<jean-marc@spaggiari.org> wrote:
> Hi Li Li,
>
> If you have more than one region, might be useful. MR will scan all the
> regions in parallel. If you do a full scan from a client API with no
> parallelism, then the MR job might be faster. But it will take more
> resources on the cluster and might impact the SLA of the other clients, if
> any,
>
> JM
>
>
> 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <dontariq@gmail.com>:
>
>> Well, it depends. Could you please provide some more details?It will help
>> us in giving a proper answer.
>>
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>
>>
>> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <fancyerii@gmail.com> wrote:
>>
>> > I have a full table scan which cost about 10 minutes. it seems a
>> > bottleneck for our application. if use map-reduce to rewrite it. will
>> > it be faster?
>> >
>>

Mime
View raw message