pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3850) Optimize join followed by order by using same key
Date Thu, 27 Mar 2014 23:27:16 GMT

     [ https://issues.apache.org/jira/browse/PIG-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohini Palaniswamy updated PIG-3850:
------------------------------------

    Description: 
Possible optimizations:
    1) If it is a skewed join, then we can combine ordering into it instead of doing a additional
orderby as we skewed join already involves sampling.
    2) If it is a normal join, then we can do the order by and then join. i.e
Current plan:
  Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4 (sampler),
Vertex 5 (Partitioner using vertex 4 sample) -> Vertex 6 (order by)
New plan:
  Vertex 1 (load massive) -> Vertex 2 (sampler), Vertex 3 (Partitioner using vertex 2 sample)
-> Vertex 4 (order by and join) <- Vertex 5 (load big and construct WeightedRangePartitioner
from Vertex 2 sample)
   3) If it is a replicated join, similar plan in 2) should work with Vertex 5 changing to
broadcast input to Vertex 4 instead of using WeightedRangePartitioner.


  was:
Possible optimizations:
    1) If it is a skewed join, then we can combine ordering into it instead of doing a additional
orderby as we skewed join already involves sampling.
    2) If it is a normal join, then we can do the order by and then join. i.e
Current plan:
  Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4 (sampler),
Vertex 5 (Partitioner) -> Vertex 6 (order by)
New plan:
  Vertex 1 (load massive) -> Vertex 2 (sampler), Vertex 3 (Partitioner) -> Vertex 4
(order by and join) <- Vertex 5 (load big)



> Optimize join followed by order by using same key
> -------------------------------------------------
>
>                 Key: PIG-3850
>                 URL: https://issues.apache.org/jira/browse/PIG-3850
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>             Fix For: tez-branch
>
>
> Possible optimizations:
>     1) If it is a skewed join, then we can combine ordering into it instead of doing
a additional orderby as we skewed join already involves sampling.
>     2) If it is a normal join, then we can do the order by and then join. i.e
> Current plan:
>   Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4 (sampler),
Vertex 5 (Partitioner using vertex 4 sample) -> Vertex 6 (order by)
> New plan:
>   Vertex 1 (load massive) -> Vertex 2 (sampler), Vertex 3 (Partitioner using vertex
2 sample) -> Vertex 4 (order by and join) <- Vertex 5 (load big and construct WeightedRangePartitioner
from Vertex 2 sample)
>    3) If it is a replicated join, similar plan in 2) should work with Vertex 5 changing
to broadcast input to Vertex 4 instead of using WeightedRangePartitioner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message