hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik K <oss....@gmail.com>
Subject Re: Multi ranges Scan
Date Fri, 26 Mar 2010 06:30:08 GMT
On Thu, Mar 25, 2010 at 8:03 PM, Andriy Kolyadenko <
crypto5@mail.saturnfans.com> wrote:

> My task is following: I have the list of key ranges and I need to perform
> MR for this ranges as fast as possible.



> As far as I understand MR will do full scan if I will use filter. Is it
> correct?


On a given InputSplit, yes.

But, see HBASE-2302 , where you can inherit from TableInputFormat and
override , a method to reduce the number of InputSplits.
That will significantly reduce the overhead of the bulk scan, and restrict
your filter only to those inputsplits, passing the criteria.


*
*

>
>
>
> --- saint.ack@gmail.com wrote:
>
> From: Stack <saint.ack@gmail.com>
> To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
> Subject: Re: Multi ranges Scan
> Date: Thu, 25 Mar 2010 19:57:44 -0700
>
> Can you use a filter to do this?  If no pattern to the excludes then
> it's tougher. How do you know what to exclude?   It's in a repository
> somewhere?  Add a filter to query this repo?
>
>
>
> On Mar 25, 2010, at 4:07 PM, "Andriy Kolyadenko" <
> crypto5@mail.saturnfans.com
>  > wrote:
>
> > Ok, it would work for regions pruning. And what about actual rows
> > pruning inside single region? Do you have any ideas how to implement
> > it?
> >
> > --- Stack wrote: ---
> >
> > I think you need to make a custom splitter for your mapreduce job, one
> > that makes splits that align with the ranges you'd have your job run
> > over.   A permutation on HBASE-2302 might work for you.
> >
> > St.Ack
> >
> > On Wed, Mar 17, 2010 at 1:32 PM, Andrey Kolyadenko
> > <cryp...@mailx.ru> wrote:
> >> Hi all,
> >>
> >> maybe somebody could give me advice in the following situation:
> >>
> >> Currently HBase Scan interface provides ability to set up only
> >> first and
> >> last rows for MR scanning. Is it any way to get multiple ranges
> >> into the map
> >> input?
> >>
> >> For example let's assume I have following table:
> >> key value
> >> 1   v1
> >> 2   v2
> >> 3   v3
> >> 4   v4
> >> 5   v5
> >>
> >> What I need is to get for example [1,2) and [4,5) ranges as input
> >> for my Map
> >> task. Actually I need this for the performance optimization.
> >>
> >> Any advice?
> >>
> >> Thanks.
> >
> >
> > _____________________________________________________________
> > Sign up for your free SaturnFans email account at
> http://webmail.saturnfans.com/
>
>
>
>
> _____________________________________________________________
> Sign up for your free SaturnFans email account at
> http://webmail.saturnfans.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message