[ https://issues.apache.org/jira/browse/PIG3850?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
Rohini Palaniswamy updated PIG3850:

Description:
Possible optimizations:
1) If it is a skewed join, then we can combine ordering into it instead of doing a additional
orderby as we skewed join already involves sampling.
2) If it is a normal join, then we can do the order by and then join. i.e
Current plan:
Vertex 1 (load massive), Vertex 2 (load big) > Vertex 3 (join) > Vertex 4 (sampler),
Vertex 5 (Partitioner using vertex 4 sample) > Vertex 6 (order by)
New plan:
Vertex 1 (load massive) > Vertex 2 (sampler), Vertex 3 (Partitioner using vertex 2 sample)
> Vertex 4 (order by and join) < Vertex 5 (load big and construct WeightedRangePartitioner
from Vertex 2 sample)
3) If it is a replicated join, similar plan in 2) should work with Vertex 5 changing to
broadcast input to Vertex 4 instead of using WeightedRangePartitioner.
was:
Possible optimizations:
1) If it is a skewed join, then we can combine ordering into it instead of doing a additional
orderby as we skewed join already involves sampling.
2) If it is a normal join, then we can do the order by and then join. i.e
Current plan:
Vertex 1 (load massive), Vertex 2 (load big) > Vertex 3 (join) > Vertex 4 (sampler),
Vertex 5 (Partitioner) > Vertex 6 (order by)
New plan:
Vertex 1 (load massive) > Vertex 2 (sampler), Vertex 3 (Partitioner) > Vertex 4
(order by and join) < Vertex 5 (load big)
> Optimize join followed by order by using same key
> 
>
> Key: PIG3850
> URL: https://issues.apache.org/jira/browse/PIG3850
> Project: Pig
> Issue Type: Subtask
> Reporter: Rohini Palaniswamy
> Fix For: tezbranch
>
>
> Possible optimizations:
> 1) If it is a skewed join, then we can combine ordering into it instead of doing
a additional orderby as we skewed join already involves sampling.
> 2) If it is a normal join, then we can do the order by and then join. i.e
> Current plan:
> Vertex 1 (load massive), Vertex 2 (load big) > Vertex 3 (join) > Vertex 4 (sampler),
Vertex 5 (Partitioner using vertex 4 sample) > Vertex 6 (order by)
> New plan:
> Vertex 1 (load massive) > Vertex 2 (sampler), Vertex 3 (Partitioner using vertex
2 sample) > Vertex 4 (order by and join) < Vertex 5 (load big and construct WeightedRangePartitioner
from Vertex 2 sample)
> 3) If it is a replicated join, similar plan in 2) should work with Vertex 5 changing
to broadcast input to Vertex 4 instead of using WeightedRangePartitioner.

This message was sent by Atlassian JIRA
(v6.2#6252)
