pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3850) Optimize join followed by order by using same key
Date Fri, 30 May 2014 19:17:04 GMT

     [ https://issues.apache.org/jira/browse/PIG-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-3850:
----------------------------

    Fix Version/s:     (was: tez-branch)
                   0.14.0

> Optimize join followed by order by using same key
> -------------------------------------------------
>
>                 Key: PIG-3850
>                 URL: https://issues.apache.org/jira/browse/PIG-3850
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>
> Possible optimizations:
>     1) If it is a skewed join, then we can combine ordering into it instead of doing
a additional orderby as we skewed join already involves sampling.
>     2) If it is a normal join, then we can do the order by and then join. i.e
> Current plan:
>   Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4 (sampler),
Vertex 5 (Partitioner using vertex 4 sample) -> Vertex 6 (order by)
> New plan:
>   Vertex 1 (load massive) -> Vertex 2 (sampler), Vertex 3 (Partitioner using vertex
2 sample) -> Vertex 4 (order by and join) <- Vertex 5 (load big and construct WeightedRangePartitioner
from Vertex 2 sample)
>    3) If it is a replicated join, similar plan in 2) should work with Vertex 5 changing
to broadcast input to Vertex 4 instead of using WeightedRangePartitioner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message