Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 67537 invoked from network); 2 Dec 2007 02:48:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Dec 2007 02:48:21 -0000 Received: (qmail 80614 invoked by uid 500); 2 Dec 2007 02:48:03 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 80537 invoked by uid 500); 2 Dec 2007 02:48:03 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 80517 invoked by uid 99); 2 Dec 2007 02:48:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Dec 2007 18:48:03 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Dec 2007 02:47:51 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 55DB071420A for ; Sat, 1 Dec 2007 18:47:43 -0800 (PST) Message-ID: <11208106.1196563663349.JavaMail.jira@brutus> Date: Sat, 1 Dec 2007 18:47:43 -0800 (PST) From: "Edward Yoon (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-2021) [Hbase Shell] Sort Join Implementation In-Reply-To: <30211267.1192003550793.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Yoon updated HADOOP-2021: -------------------------------- Status: Open (was: Patch Available) > [Hbase Shell] Sort Join Implementation > -------------------------------------- > > Key: HADOOP-2021 > URL: https://issues.apache.org/jira/browse/HADOOP-2021 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Affects Versions: 0.16.0 > Environment: all environments > Reporter: Edward Yoon > Assignee: Edward Yoon > Fix For: 0.16.0 > > Attachments: 2021_v01.patch, 2021_v02.txt, 2021_v04.patch, 2021_v05, 2021_v05.patch, 2021_v06.patch > > > If we don't have an index for a domain in the join, we can still improve on the nested-loop join using sort join. > {code} > R1 = table('movieLog_table'); > R2 = table('stockCompany_info'); > result = R1.join(R1.studioName = R2.corporation) and R2; > {code} > ---- > {code} > r1 > a b c > ====================== > row1 a1 b1 c1 > row2 a2 b2 c2 > row3 a1 b3 c3 > r2 > e f > ================== > row1 e1 a1 > row2 e2 f2 > row3 e3 f3 > row4 e4 a1 > row5 e5 a2 > r1 = table('r1'); > r2 = table('r2'); > r3 = r1.join(r1.a = r2.f) and r2; > --------------------------------------------- > temp table T : Sorted set by "f" > row > ============= > a1 row:row1 > row:row4 > a2 row:row5 > f2 row:row2 > f3 row:row3 > --------------------- > r3 > r1.row a b c r2.row e f > =================================================== > row1.row1 row1 a1 b1 c1 row1 e1 a1 > row1.row4 row1 a1 b1 c1 row4 e4 a1 > row2.row5 row2 a2 b2 c2 row5 e5 a2 > row3.row1 row3 a1 b3 c3 row1 e1 a1 > row3.row4 row3 a1 b3 c3 row4 e4 a1 > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.