Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B6FA869DF for ; Fri, 17 Jun 2011 00:03:20 +0000 (UTC) Received: (qmail 99251 invoked by uid 500); 17 Jun 2011 00:03:18 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 99211 invoked by uid 500); 17 Jun 2011 00:03:18 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 99203 invoked by uid 99); 17 Jun 2011 00:03:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 00:03:18 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of buttler1@llnl.gov designates 128.115.41.81 as permitted sender) Received: from [128.115.41.81] (HELO nspiron-1.llnl.gov) (128.115.41.81) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 00:03:10 +0000 X-Attachments: None Received: from nspexhub-2.llnl.gov (HELO nspexhub-2.the-lab.llnl.gov) ([128.115.54.114]) by nspiron-1.llnl.gov with ESMTP; 16 Jun 2011 17:02:49 -0700 Received: from NSPEXMBX-A.the-lab.llnl.gov ([128.115.54.101]) by nspexhub-2.the-lab.llnl.gov ([172.16.54.114]) with mapi; Thu, 16 Jun 2011 17:02:49 -0700 From: "Buttler, David" To: "user@hbase.apache.org" Date: Thu, 16 Jun 2011 17:02:48 -0700 Subject: RE: How to efficiently join HBase tables? Thread-Topic: How to efficiently join HBase tables? Thread-Index: AcwsJ3vS0ep4ab7yRFmLV/ahz/UisQAWSl2w Message-ID: <2D6136772A13B84E95DF6DA79E85A9F0014BE3B692ED@NSPEXMBX-A.the-lab.llnl.gov> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Depends on a couple of things. If your LIST is a permanent feature of your= document, then it might make sense to add the list(Boolean? Or the list in= dex if the list has a particular sort order) to the doc record. Otherwise,= a little simple programming can get you the results you want: 1) Sort the list (if it is big, then a map reduce job with an identity map = / single identity reducer would do the job). If you require the order of t= he list to be maintained then you need to add another field to the list ind= icating order, so that you can recover that after the join. 2) output a list of DOCID / UUID sorted on DOCID 3) use a double iterator through your two outputs to find the UUIDs from th= e list (and optionally its order in the list) 4) optionally resort the UUID list by the list order index This will not be particularly fast, but it should be robust to large list s= izes. If your list can fit into the memory of a map task, then put it in a hash m= ap for each Map job, and while you iterate over your docs table, you can on= ly output UUIDs and sort order, and let your reducer reorder them according= to your list order. Dave -----Original Message----- From: Florin P [mailto:florinpico@yahoo.com]=20 Sent: Thursday, June 16, 2011 5:44 AM To: user@hbase.apache.org Subject: Re: How to efficiently join HBase tables? Hello! Regarding the same subject of joining, I have the following scenario: 1. I have a big table DOCS that contains the columns UUID DOCID sdsd 1 hdhs 3 gdhg 7 shdg 9 =20 and so on (hope you got the idea) 2. an external list of docID=20 (LIST) 3 1 7=20 upon a I have to query("join") the DOCS DOCID column, so that the result s= hould be hdhs, sdsd, gdhg. How I can implement such a request? Can be this a possible solution: 1. to add a new column LIST (in the same column family ) to the DOCS=20 2 add a new record in it that contain my LIST of docID 3. "Join" column LIST with DOCID column? ( perhaps a weird idea) Thank you. Regards, Florin =20 =20