hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-199) New Join types in Pig
Date Wed, 16 Apr 2008 13:59:21 GMT

    [ https://issues.apache.org/jira/browse/PIG-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589560#action_12589560
] 

Pi Song commented on PIG-199:
-----------------------------

Amir,

Just out of curiosity. How do you plan to implement Fragment and Replace Join? Is it like
? :-

For A ⋈ B :-
In A:  Map (k1, v1) --> { ((a ,1),(k1,v1)), ((a ,2),(k1,v1)), ((a ,3),(k1,v1)), ... , ((a
,M),(k1,v1)) }    where a = GetPartitionA( (k1,v1) ) into N partitions
In B:  Map (k1, v1) --> { ((1 ,b),(k1,v1)), ((2 ,b),(k1,v1)), ((3 ,b),(k1,v1)), ... , ((N
,b),(k1,v1)) }    where b = GetPartitionB( (k1,v1) ) into M partitions

And then having N * M reduce buckets doing local join?

If that is the case, the amount of data will be multiplied. Wouldn't the performance be worse?
Is this solely for inequality join feature ?

> New Join types in Pig
> ---------------------
>
>                 Key: PIG-199
>                 URL: https://issues.apache.org/jira/browse/PIG-199
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Amir Youssefi
>            Assignee: Amir Youssefi
>
> We need to design and implementation new Join Types in Pig which can potentially improve
the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message