Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 468DC44CC for ; Tue, 31 May 2011 14:22:16 +0000 (UTC) Received: (qmail 76525 invoked by uid 500); 31 May 2011 14:22:14 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 76498 invoked by uid 500); 31 May 2011 14:22:14 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 76490 invoked by uid 99); 31 May 2011 14:22:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 14:22:14 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.18.222.49] (HELO smtp3.4emm.com) (69.18.222.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 14:22:08 +0000 Received: from EX2K7VS03.4emm.local ([192.168.160.203]) by HUB03.4emm.local ([192.168.161.134]) with mapi; Tue, 31 May 2011 10:21:46 -0400 From: Doug Meil To: "user@hbase.apache.org" Date: Tue, 31 May 2011 10:22:34 -0400 Subject: RE: How to efficiently join HBase tables? Thread-Topic: How to efficiently join HBase tables? Thread-Index: Acwfiwk7nB6iTmkxRrG+6myO5ut1vwAEpsPg Message-ID: <67680900F79B1D4F99C844EE386FC5952823B100A5@EX2K7VS03.4emm.local> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Re: "The problem is that the few references to that question I found recom= mend pulling one table to the mapper and then do a lookup for the referred = row in the second table." With multi-get in .90.x you could perform some reasonably clever processing= and not do the lookups one-by-one but in batches. Also, if the other table is "small" you could have the leverage the block c= ache on the lookups (i.e., if it's a domain/lookup table). =20 -----Original Message----- From: eran@gigya-inc.com [mailto:eran@gigya-inc.com] On Behalf Of Eran Kutn= er Sent: Tuesday, May 31, 2011 8:06 AM To: user@hbase.apache.org Subject: How to efficiently join HBase tables? Hi, I need to join two HBase tables. The obvious way is to use a M/R job for th= at. The problem is that the few references to that question I found recomme= nd pulling one table to the mapper and then do a lookup for the referred ro= w in the second table. This sounds like a very inefficient way to do join with map reduce. I beli= eve it would be much better to feed the rows of both tables to the mapper a= nd let it emit a key based on the join fields. Since all the rows with the = same join fields values will have the same key the reducer will be able to = easily generate the result of the join. The problem with this is that I couldn't find a way to feed two tables to a= single map reduce job. I could probably dump the tables to files in a sing= le directory and then run the join on the files but that really makes no se= nse. Am I missing something? Any other ideas? -eran