hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From john smith <js1987.sm...@gmail.com>
Subject Re: MR in HBase
Date Fri, 08 Jan 2010 17:25:00 GMT
Mridul

Can you be more clear .. I didn't get you !

On Fri, Jan 8, 2010 at 6:13 PM, Mridul Muralidharan
<mridulm@yahoo-inc.com>wrote:

>
>
> If you just want to scan both tables for your mapper, assuming there is no
> easier way to do it - cant you not write a composite input format which
> delegates to both tables input formats ?
>
>
> Regards,
> Mridul
>
>
> john smith wrote:
>
>> Stack,
>>
>> The requirement is that I need to I need to scan two tables A,B for  an MR
>> job ,Order is not important . That is , the reduce phase  contains both
>> keys
>> from both A,B.
>>
>> Presently what iam doing is that I am using TableMap for "A" .. And in one
>> of the mappers , I am reading the entire B using a scanner. But this is a
>> big overhead right ! Because non-local  B data will we transferred (over
>> network) to the machine executing that Map phase . Instead what
>> I was thinking is that , there is some kind of variant of TableMap which
>> scans for both A,B and emit the corresponding keys . Order is not at all
>> important  and also no random lookups . I need the entire B table keys in
>> some way or the other with least overhead !
>>
>> Also therz one more solution I was thinking ..  Suppose Iam scanning some
>> particular region using table map . I can get that particular region names
>> using some func in the API , then I can build a scanner on B over that
>> particular region and emit all the keys from B . This doesn't require and
>> network transfer of data . Is this solution feasible ?? If yes any hints
>> on
>> what classes to use from API ?
>>
>> Thanks ,
>> J-S
>>
>> On Fri, Jan 8, 2010 at 10:46 AM, stack <stack@duboce.net> wrote:
>>
>>  This is a little tough.  Do both tables have same number of regions?  Are
>>> you walking through the two tables serially in your mapreduce or do you
>>> want
>>> to do random lookups into the second table dependent on the row you are
>>> currently processing in table one?
>>>
>>> St.Ack
>>>
>>>
>>> On Thu, Jan 7, 2010 at 7:51 PM, john smith <js1987.smith@gmail.com>
>>> wrote:
>>>
>>>  Hi all,
>>>>
>>>> My requirement is that , I must read two tables (belonging to the same
>>>> region server) in the same Map .
>>>>
>>>> Normally TableMap supports only 1 table at a time and right now I am
>>>> reading
>>>> the entire 2nd table in any one
>>>> of the maps , This is a big overhead . So can any one suggest some
>>>> modification of TableMap or a different
>>>> approach which can read 2 tables simultaneously at the same time . This
>>>>
>>> can
>>>
>>>> be very useful to us!
>>>>
>>>> Thanks
>>>> J-S
>>>>
>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message