hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From john smith <js1987.sm...@gmail.com>
Subject Re: MR in HBase
Date Fri, 08 Jan 2010 10:00:09 GMT

The requirement is that I need to I need to scan two tables A,B for  an MR
job ,Order is not important . That is , the reduce phase  contains both keys
from both A,B.

Presently what iam doing is that I am using TableMap for "A" .. And in one
of the mappers , I am reading the entire B using a scanner. But this is a
big overhead right ! Because non-local  B data will we transferred (over
network) to the machine executing that Map phase . Instead what
I was thinking is that , there is some kind of variant of TableMap which
scans for both A,B and emit the corresponding keys . Order is not at all
important  and also no random lookups . I need the entire B table keys in
some way or the other with least overhead !

Also therz one more solution I was thinking ..  Suppose Iam scanning some
particular region using table map . I can get that particular region names
using some func in the API , then I can build a scanner on B over that
particular region and emit all the keys from B . This doesn't require and
network transfer of data . Is this solution feasible ?? If yes any hints on
what classes to use from API ?

Thanks ,

On Fri, Jan 8, 2010 at 10:46 AM, stack <stack@duboce.net> wrote:

> This is a little tough.  Do both tables have same number of regions?  Are
> you walking through the two tables serially in your mapreduce or do you
> want
> to do random lookups into the second table dependent on the row you are
> currently processing in table one?
> St.Ack
> On Thu, Jan 7, 2010 at 7:51 PM, john smith <js1987.smith@gmail.com> wrote:
> > Hi all,
> >
> > My requirement is that , I must read two tables (belonging to the same
> > region server) in the same Map .
> >
> > Normally TableMap supports only 1 table at a time and right now I am
> > reading
> > the entire 2nd table in any one
> > of the maps , This is a big overhead . So can any one suggest some
> > modification of TableMap or a different
> > approach which can read 2 tables simultaneously at the same time . This
> can
> > be very useful to us!
> >
> > Thanks
> > J-S
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message