hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Yoon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2021) Sort Join Implementation
Date Sun, 18 Nov 2007 02:52:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543331
] 

Edward Yoon commented on HADOOP-2021:
-------------------------------------

r1
       a     b    c
================
row1   a1    b1   c1
row2   a2    b2   c2

r2
       e     f
============
row1   e1    a1
row2   e2    f2
row3   e3    f3
row4   e4    a1

{code}
r1 = table('r1');
r2 = table('r2');
r3 = r1.join(r1.a = r2.f) and r2;
{code}

r3
      a    b    c   row    e   f  
=========================
row1  a1   b1   c1  row1  e1  a1
row1  a1   b1   c1  row4  e4  a1


> Sort Join Implementation
> ------------------------
>
>                 Key: HADOOP-2021
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2021
>             Project: Hadoop
>          Issue Type: Sub-task
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>         Environment: all environments  
>            Reporter: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2021_v01.patch
>
>
> If we don't have an index for a domain in the join, we can still improve on the nested-loop
join using sort join.
> {code}
> R1 = table('movieLog_table');
> R2 = table('stockCompany_info');
> result = R1.join(R1.studioName = R2.corporation) and R2;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message