pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3849) Optimize group by followed by join on the same key
Date Fri, 06 Mar 2015 02:05:38 GMT

     [ https://issues.apache.org/jira/browse/PIG-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Zhang updated PIG-3849:
----------------------------
    Description: 
e.g Group by followed by join on the same key  
This can be done in one vertex with multiple inputs instead of having an extra vertex to do
the join. i.e Currently Vertex 1 (load relation1) -> Vertex 2 (group by) -> Vertex 4
(join) <- Vertex 3 (load relation 2). This could be changed to Vertex 1 (load relation1)
-> Vertex 2 (group by and join) <- Vertex 3 (load relation 2)

And idea of this kind of optimization from YSmart that hive already integrate it. Now pig
has already integrate tez, so it would be natural to integrate YSmart into pig on tez.


  was:
  This can be done in one vertex with multiple inputs instead of having an extra vertex to
do the join. i.e Currently Vertex 1 (load relation1) -> Vertex 2 (group by) -> Vertex
4 (join) <- Vertex 3 (load relation 2). This could be changed to Vertex 1 (load relation1)
-> Vertex 2 (group by and join) <- Vertex 3 (load relation 2)

And this could be extended into a more general way to do query correlation optimization. 




> Optimize group by followed by join on the same key
> --------------------------------------------------
>
>                 Key: PIG-3849
>                 URL: https://issues.apache.org/jira/browse/PIG-3849
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>
> e.g Group by followed by join on the same key  
> This can be done in one vertex with multiple inputs instead of having an extra vertex
to do the join. i.e Currently Vertex 1 (load relation1) -> Vertex 2 (group by) -> Vertex
4 (join) <- Vertex 3 (load relation 2). This could be changed to Vertex 1 (load relation1)
-> Vertex 2 (group by and join) <- Vertex 3 (load relation 2)
> And idea of this kind of optimization from YSmart that hive already integrate it. Now
pig has already integrate tez, so it would be natural to integrate YSmart into pig on tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message