hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mrid...@yahoo-inc.com>
Subject Re: MR in HBase
Date Fri, 08 Jan 2010 12:43:16 GMT

If you just want to scan both tables for your mapper, assuming there is 
no easier way to do it - cant you not write a composite input format 
which delegates to both tables input formats ?


john smith wrote:
> Stack,
> The requirement is that I need to I need to scan two tables A,B for  an MR
> job ,Order is not important . That is , the reduce phase  contains both keys
> from both A,B.
> Presently what iam doing is that I am using TableMap for "A" .. And in one
> of the mappers , I am reading the entire B using a scanner. But this is a
> big overhead right ! Because non-local  B data will we transferred (over
> network) to the machine executing that Map phase . Instead what
> I was thinking is that , there is some kind of variant of TableMap which
> scans for both A,B and emit the corresponding keys . Order is not at all
> important  and also no random lookups . I need the entire B table keys in
> some way or the other with least overhead !
> Also therz one more solution I was thinking ..  Suppose Iam scanning some
> particular region using table map . I can get that particular region names
> using some func in the API , then I can build a scanner on B over that
> particular region and emit all the keys from B . This doesn't require and
> network transfer of data . Is this solution feasible ?? If yes any hints on
> what classes to use from API ?
> Thanks ,
> J-S
> On Fri, Jan 8, 2010 at 10:46 AM, stack <stack@duboce.net> wrote:
>> This is a little tough.  Do both tables have same number of regions?  Are
>> you walking through the two tables serially in your mapreduce or do you
>> want
>> to do random lookups into the second table dependent on the row you are
>> currently processing in table one?
>> St.Ack
>> On Thu, Jan 7, 2010 at 7:51 PM, john smith <js1987.smith@gmail.com> wrote:
>>> Hi all,
>>> My requirement is that , I must read two tables (belonging to the same
>>> region server) in the same Map .
>>> Normally TableMap supports only 1 table at a time and right now I am
>>> reading
>>> the entire 2nd table in any one
>>> of the maps , This is a big overhead . So can any one suggest some
>>> modification of TableMap or a different
>>> approach which can read 2 tables simultaneously at the same time . This
>> can
>>> be very useful to us!
>>> Thanks
>>> J-S

View raw message