hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andriy Kolyadenko" <cryp...@mail.saturnfans.com>
Subject Re: Multi ranges Scan
Date Fri, 26 Mar 2010 13:59:26 GMT

Thanks Stack and Karthik for the hints.

And what about this idea:
I can define my own splitter as Karthik advised and if one region should be MR-scanned for
2 different ranges then the splitter can produce two different TableSplits for this region.
It will definitely prune the number of scanned rows but from another hand it will increase
the number of TableSplit objects. 

I expect to see not more then 1000 ranges in my case.

Could you as HBase experts advise me: does this approach contain any performance pitfalls
in your opinion?

Thanks again.

--- oss.akk@gmail.com wrote:

From: Karthik K <oss.akk@gmail.com>
To: hbase-user@hadoop.apache.org
Subject: Re: Multi ranges Scan
Date: Thu, 25 Mar 2010 23:43:20 -0700

On Thu, Mar 25, 2010 at 11:30 PM, Karthik K <oss.akk@gmail.com> wrote:

>
>
> On Thu, Mar 25, 2010 at 8:03 PM, Andriy Kolyadenko <
> crypto5@mail.saturnfans.com> wrote:
>
>> My task is following: I have the list of key ranges and I need to perform
>> MR for this ranges as fast as possible.
>
>
>
>> As far as I understand MR will do full scan if I will use filter. Is it
>> correct?
>
>
> On a given InputSplit, yes.
>
> But, see HBASE-2302 , where you can inherit from TableInputFormat and
> override , a method to reduce the number of InputSplits.
> That will significantly reduce the overhead of the bulk scan, and restrict
> your filter only to those inputsplits, passing the criteria.
>
>
> *
> *
>
>>
>>
>>
>> --- saint.ack@gmail.com wrote:
>>
>> From: Stack <saint.ack@gmail.com>
>> To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
>> Subject: Re: Multi ranges Scan
>> Date: Thu, 25 Mar 2010 19:57:44 -0700
>>
>> Can you use a filter to do this?  If no pattern to the excludes then
>> it's tougher. How do you know what to exclude?   It's in a repository
>> somewhere?  Add a filter to query this repo?
>>
>>
>>
>> On Mar 25, 2010, at 4:07 PM, "Andriy Kolyadenko" <
>> crypto5@mail.saturnfans.com
>>  > wrote:
>>
>> > Ok, it would work for regions pruning. And what about actual rows
>> > pruning inside single region? Do you have any ideas how to implement
>> > it?
>> >
>> > --- Stack wrote: ---
>> >
>> > I think you need to make a custom splitter for your mapreduce job, one
>> > that makes splits that align with the ranges you'd have your job run
>> > over.   A permutation on HBASE-2302 might work for you.
>>
>
Oops. Sorry for the redundant info !



> >
>> > St.Ack
>> >
>> > On Wed, Mar 17, 2010 at 1:32 PM, Andrey Kolyadenko
>> > <cryp...@mailx.ru> wrote:
>> >> Hi all,
>> >>
>> >> maybe somebody could give me advice in the following situation:
>> >>
>> >> Currently HBase Scan interface provides ability to set up only
>> >> first and
>> >> last rows for MR scanning. Is it any way to get multiple ranges
>> >> into the map
>> >> input?
>> >>
>> >> For example let's assume I have following table:
>> >> key value
>> >> 1   v1
>> >> 2   v2
>> >> 3   v3
>> >> 4   v4
>> >> 5   v5
>> >>
>> >> What I need is to get for example [1,2) and [4,5) ranges as input
>> >> for my Map
>> >> task. Actually I need this for the performance optimization.
>> >>
>> >> Any advice?
>> >>
>> >> Thanks.
>> >
>> >
>> > _____________________________________________________________
>> > Sign up for your free SaturnFans email account at
>> http://webmail.saturnfans.com/
>>
>>
>>
>>
>> _____________________________________________________________
>> Sign up for your free SaturnFans email account at
>> http://webmail.saturnfans.com/
>>
>
>




_____________________________________________________________
Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/

Mime
View raw message