hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject RE: How to efficiently join HBase tables?
Date Tue, 31 May 2011 14:22:34 GMT

Re:  "The problem is that the few references to that question I found recommend pulling one
table to the mapper and then do a lookup for the referred row in the second table."

With multi-get in .90.x you could perform some reasonably clever processing and not do the
lookups one-by-one but in batches.

Also, if the other table is "small" you could have the leverage the block cache on the lookups
(i.e., if it's a domain/lookup table).  



-----Original Message-----
From: eran@gigya-inc.com [mailto:eran@gigya-inc.com] On Behalf Of Eran Kutner
Sent: Tuesday, May 31, 2011 8:06 AM
To: user@hbase.apache.org
Subject: How to efficiently join HBase tables?

Hi,
I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem
is that the few references to that question I found recommend pulling one table to the mapper
and then do a lookup for the referred row in the second table.
This sounds like a very inefficient way to do  join with map reduce. I believe it would be
much better to feed the rows of both tables to the mapper and let it emit a key based on the
join fields. Since all the rows with the same join fields values will have the same key the
reducer will be able to easily generate the result of the join.
The problem with this is that I couldn't find a way to feed two tables to a single map reduce
job. I could probably dump the tables to files in a single directory and then run the join
on the files but that really makes no sense.

Am I missing something? Any other ideas?

-eran

Mime
View raw message